Supplementary Exercise 8.103 of IPS7e ------------------------------------- Sample size calculation to obtain a margin of error of 0.10. Assume X ~ B(n,p1) and Y ~ B(n,p2). The classical, approximate 95% CI for the difference p1-p2 is Dp +- 1.96*SE(Dp) where Dp = p1_hat - p2_hat, and SE(Dp) = sqrt(p1_hat(1-p1_hat)/n + p2_hat*(1-p2_hat)/n). The SE(Dp) (and therefore also the margin of error) is largest if both p1_hat and p2_hat are equal to 0.5. We'll therefore get a conservative estimate of the required sample size if we work with both p_hat=0.5. To get a margin of error of at most 0.10, we need to solve 0.1 >= 1.96*sqrt(0.5*0.5/n+0.5*0.5/n) ~= 2/sqrt(2n) or 2n >= (2/0.1)^2 = 400 or n>= 200 (or 193 if working with 1.96 rather than the rounded-off value of 2) We need at least 200 subjects in each group to get a margin of error less than 0.1. This estimate is conservative (too large) if the proportions in the two populations are far from 0.5. From this calculation, we deduce that the general formula for a conservative sample size calculation, at a desired margin of error of m and a critical value zstar, for comparison of two independent proportions, is n >= (zstar/m)^2 /2 This formula is not in the VHM 801 textbooks. The expression for n is seen to be twice of the expression for a single sample. One valid approach to this (conservative) sample size calculation is to double the value for n from a one-sample calculation (which can be done in Minitab). It's a conservative approach because in practice one would rarely expect both probabilities to be very close to 0.5. If we instead used guessed values (p1 and p2) for the two proportions, the formula would become n >= (zstar/m)^2 * (p1*(1-p1) + p2(1-p2))