Extra Exercise 11
-----------------
Confidence intervals for a mean applet.

(a)
The Sample size strongly affects the width of the confidence intervals (CIs).
Larger sample size gives more narrow CIs. This matches the formula by
which the standard error is inversely proportional to the square-root of
the sample size. The CI level also affects the width of the CIs. Higher
CI levels lead to wider intervals, because the resulting percentiles
from the reference distribution are more extreme.

When inspecting the individual intervals, one notices that the standard
error differs between intervals. That would not be the case if they were
based on the formula for known standard deviation. Also the
multiplication factor for the standard error is larger than the z*-value
from the standard normal distribution. The explanation is that the
intervals were generated from a formula where the standard deviation is
unknown and estimated from the data (Session 6).

(b)
When sampling a large number of intervals, the proportion of intervals
that contain the true value approach the confidence level. The confidence 
level is exactly the probability for the intervals to contain the true
value. Therefore, this is the same phenomenon as when the sample 
proportion approaches the true probability.

(c)
Sampling 95% confidence intervals and recording the number of intervals
that contain the true parameter (X) corresponds to a binomial setting. For
this exercise, we sample 100 intervals. Therefore X follows B(100,0.95).
By the rules of the binomial distribution, EX = 100*0.95 = 95 and 
sdX=sqrt(100*.95*0.05)=2.18.

We run 30 replications of the experiment: observations X1,...,X30.
These observations are independent and all follow B(100,0.95). The
average, Xmean, is an unbiased estimate for the true mean and its
distribution narrows down around the true mean when we take many
samples.

For one run of the experiment (30 replications), the following results were
obtained (for convenience displayed using Minitab). The counts ("contain") for 
the 30 replications were typed into column 1.

 name c1 'contain'
 Stem-and-Leaf 'contain'.

Stem-and-leaf of contain   N = 30
1	90	0
2	91	0
4	92	00
6	93	00
8	94	00
10	95	00
(8)	96	00000000
12	97	000000000
3	98	000
Leaf Unit = 0.1

 Describe 'contain'.

Statistics
Variable   N   N*     Mean   SE Mean    StDev   Minimum  Q1  Median  Q3  Maximum
contain   30    0  95.4667  0.391676   2.14530    90     94    96    97    98

Comments:
--------
The sample mean is close to the population mean of 95, and also the
sample standard deviation is close to the population value of 2.18.
The distribution of hits appears clearly left-skewed, as one would
expect for a binomial distribution with p close 1.

Addition:
---------
In order to compare the stem and leaf plot with the theoretical
B(100,0.95) distribution, we compute the probabilities for counts in the
range 89-100.

 Name c2 "x"
 Set 'x'
   1( 89 : 100 / 1 )1
   End.
 Name c3 "px"
 PDF 'x' 'px';
 Binomial 100 .95.
 Name C4 'expected'
 Let 'expected' = 30*'px'
 Print 'x' 'px' 'expected'.

Data Display

Row    x      px      expected
1     89   0.007198   0.21595
2     90   0.016716   0.50148
3     91   0.034901   1.04704
4     92   0.064871   1.94613
5     93   0.106026   3.18077
6     94   0.150015   4.50045
7     95   0.180018   5.40053
8     96   0.178143   5.34428
9     97   0.139576   4.18727
10    98   0.081182   2.43545
11    99   0.031161   0.93482
12   100   0.005921   0.17762

Comments:
---------
It is seen that the sample distribution has too many counts of 96 and 97
and too few counts of 93-95. Still the sample statistics were close to the
true values. So the irregular shape of the sampling distribution could 
just be random fluctuations that would smooth out if more than 30 trials 
were done.

You could also directly generate a plot of the binomial distribution 
using the Graph-Probability Distribution Plot menu:

 DPlot;
  Distribution;
    Binomial 100 .95.