Supplementary exercise 8.33 of IPS7e
------------------------------------

Planning of a study on dissatisfaction of customers. A simple random
sample from the population of customers where each is classified as
dissatisfied or not can be approximated by a binomial setting if the
sample size is a small fraction (at most 1/20) of the population. We
will assume this to be the case. Therefore the statistical model for the
number of dissatisfied customers in a sample of n customers is B(n,p).

The guessed proportion of dissatisfied customers is p=0.25. In order to
achieve a 95% CI with a margin of error of at most 0.02, we solve in the
equation for the classical approximation:

  0.02 >= zstar*sqrt(p*(1-p)/n) = 1.96*sqrt(0.25*0.75/n)
  => n >= (1.96/0.02)^2*0.25*0.75 = 1801 (or 1875 with 2 instead of 1.96).

This is the formula for one proportion, e.g. slide 8L-13.
We need more than 1800 subjects to achieve such a low margin of error.
(Such a sample size is rarely realistic. The practical message is therefore
that one needs to be content with a larger margin of error.)

Suppose that we observed p=0.15 among 1800 subjects. The margin of error
would then be
  1.96*sqrt(0.15*0.85/1800) = 0.0165
and therefore a little less than required. This is because the standard
error reduces when p becomes more extreme (in this case, from 0.25 to
0.15).

Note that as in this situation the focus is on estimation rather than
testing (there is no obvious hypothesis to test), this approach to sample
size determination is more natural than power calculations.

----

Minitab commands and listing:

MTB > SSCI; 
SUBC>   BProportion .25; 
SUBC>   Confidence 95.0; 
SUBC>   IType 0; 
SUBC>   MError 0.02.
Sample Size for Estimation 

Method

Parameter            Proportion
Distribution         Binomial
Proportion           0.25
Confidence level     95%
Confidence interval  Two-sided

Results

  Margin  Sample
of Error    Size
    0.02    1921

Comments:
---------
It is seen that Minitab returns a larger required sample size. This is
because it is based on an exact confidence interval and the requirement
that both margins of the CI are at most 0.02. The exact CI is slightly
conservative but is also not symmetrical around 0.25, and the upper part
of the CI is the larger one. Therefore the requirement that both sides
of the CI are at most 0.02 is more stringent than a requirement that the
total CI length is at most 0.04. This difference in interpretation of
the meaning of a margin of error of at least 0.02 is the main reason why
the sample sizes come out different. 

In the present situation with such a large n it is perfectly fine to use 
the normal approximation.


Minitab commands and listing for additional question, using again the
Power and Sample Size-Sample Size for Estimation menu:

MTB > SSCI; 
SUBC>   BProportion .15; 
SUBC>   Confidence 95.0; 
SUBC>   IType 0; 
SUBC>   Sample 1800.
Sample Size for Estimation 

Method

Parameter            Proportion
Distribution         Binomial
Proportion           0.15
Confidence level     95%
Confidence interval  Two-sided

Results

Sample   Margin of
  Size       Error
  1800   0.0173423
  1800  -0.0161927