Supplementary exercise 8.33 of IPS7e ------------------------------------ Planning of a study on dissatisfaction of customers. A simple random sample from the population of customers where each is classified as dissatisfied or not can be approximated by a binomial setting if the sample size is a small fraction (at most 1/20) of the population. We will assume this to be the case. Therefore the statistical model for the number of dissatisfied customers in a sample of n customers is B(n,p). The guessed proportion of dissatisfied customers is p=0.25. In order to achieve a 95% CI with a margin of error of at most 0.02, we solve in the equation for the classical approximation: 0.02 >= zstar*sqrt(p*(1-p)/n) = 1.96*sqrt(0.25*0.75/n) => n >= (1.96/0.02)^2*0.25*0.75 = 1801 (or 1875 with 2 instead of 1.96). This is the formula for one proportion, e.g. slide 8L-13. We need more than 1800 subjects to achieve such a low margin of error. (Such a sample size is rarely realistic. The practical message is therefore that one needs to be content with a larger margin of error.) Suppose that we observed p=0.15 among 1800 subjects. The margin of error would then be 1.96*sqrt(0.15*0.85/1800) = 0.0165 and therefore a little less than required. This is because the standard error reduces when p becomes more extreme (in this case, from 0.25 to 0.15). Note that as in this situation the focus is on estimation rather than testing (there is no obvious hypothesis to test), this approach to sample size determination is more natural than power calculations. ---- Minitab commands and listing: MTB > SSCI; SUBC> BProportion .25; SUBC> Confidence 95.0; SUBC> IType 0; SUBC> MError 0.02. Sample Size for Estimation Method Parameter Proportion Distribution Binomial Proportion 0.25 Confidence level 95% Confidence interval Two-sided Results Margin Sample of Error Size 0.02 1921 Comments: --------- It is seen that Minitab returns a larger required sample size. This is because it is based on an exact confidence interval and the requirement that both margins of the CI are at most 0.02. The exact CI is slightly conservative but is also not symmetrical around 0.25, and the upper part of the CI is the larger one. Therefore the requirement that both sides of the CI are at most 0.02 is more stringent than a requirement that the total CI length is at most 0.04. This difference in interpretation of the meaning of a margin of error of at least 0.02 is the main reason why the sample sizes come out different. In the present situation with such a large n it is perfectly fine to use the normal approximation. Minitab commands and listing for additional question, using again the Power and Sample Size-Sample Size for Estimation menu: MTB > SSCI; SUBC> BProportion .15; SUBC> Confidence 95.0; SUBC> IType 0; SUBC> Sample 1800. Sample Size for Estimation Method Parameter Proportion Distribution Binomial Proportion 0.15 Confidence level 95% Confidence interval Two-sided Results Sample Margin of Size Error 1800 0.0173423 1800 -0.0161927