Extra Exercise 11 ----------------- Confidence intervals for a mean applet. (a) The Sample size strongly affects the width of the confidence intervals (CIs). Larger sample size gives more narrow CIs. This matches the formula by which the standard error is inversely proportional to the square-root of the sample size. The CI level also affects the width of the CIs. Higher CI levels lead to wider intervals, because the resulting percentiles from the reference distribution are more extreme. When inspecting the individual intervals, one notices that the standard error differs between intervals. That would not be the case if they were based on the formula for known standard deviation. Also the multiplication factor for the standard error is larger than the z*-value from the standard normal distribution. The explanation is that the intervals were generated from a formula where the standard deviation is unknown and estimated from the data (Session 6). (b) When sampling a large number of intervals, the proportion of intervals that contain the true value approach the confidence level. The confidence level is exactly the probability for the intervals to contain the true value. Therefore, this is the same phenomenon as when the sample proportion approaches the true probability. (c) Sampling 95% confidence intervals and recording the number of intervals that contain the true parameter (X) corresponds to a binomial setting. For this exercise, we sample 100 intervals. Therefore X follows B(100,0.95). By the rules of the binomial distribution, EX = 100*0.95 = 95 and sdX=sqrt(100*.95*0.05)=2.18. We run 30 replications of the experiment: observations X1,...,X30. These observations are independent and all follow B(100,0.95). The average, Xmean, is an unbiased estimate for the true mean and its distribution narrows down around the true mean when we take many samples. For one run of the experiment (30 replications), the following results were obtained (for convenience displayed using Minitab). The counts ("contain") for the 30 replications were typed into column 1. name c1 'contain' Stem-and-Leaf 'contain'. Stem-and-leaf of contain N = 30 1 90 0 2 91 0 4 92 00 6 93 00 8 94 00 10 95 00 (8) 96 00000000 12 97 000000000 3 98 000 Leaf Unit = 0.1 Describe 'contain'. Statistics Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3 Maximum contain 30 0 95.4667 0.391676 2.14530 90 94 96 97 98 Comments: -------- The sample mean is close to the population mean of 95, and also the sample standard deviation is close to the population value of 2.18. The distribution of hits appears clearly left-skewed, as one would expect for a binomial distribution with p close 1. Addition: --------- In order to compare the stem and leaf plot with the theoretical B(100,0.95) distribution, we compute the probabilities for counts in the range 89-100. Name c2 "x" Set 'x' 1( 89 : 100 / 1 )1 End. Name c3 "px" PDF 'x' 'px'; Binomial 100 .95. Name C4 'expected' Let 'expected' = 30*'px' Print 'x' 'px' 'expected'. Data Display Row x px expected 1 89 0.007198 0.21595 2 90 0.016716 0.50148 3 91 0.034901 1.04704 4 92 0.064871 1.94613 5 93 0.106026 3.18077 6 94 0.150015 4.50045 7 95 0.180018 5.40053 8 96 0.178143 5.34428 9 97 0.139576 4.18727 10 98 0.081182 2.43545 11 99 0.031161 0.93482 12 100 0.005921 0.17762 Comments: --------- It is seen that the sample distribution has too many counts of 96 and 97 and too few counts of 93-95. Still the sample statistics were close to the true values. So the irregular shape of the sampling distribution could just be random fluctuations that would smooth out if more than 30 trials were done. You could also directly generate a plot of the binomial distribution using the Graph-Probability Distribution Plot menu: DPlot; Distribution; Binomial 100 .95.