Supplementary Exercise 5.59 of IPE7e ------------------------------------ Opinion poll on the ideal number of children. The focus is on the answer that two children is ideal, and 49% out of n=1006 adults interviewed said this was the case. The poll reported a margin of error of +- 3%. The task is to compute the probability that the sample proportion falls within 0.49+-0.03 if the true proportion is 0.49. Note, as a justification of this calculation, that if the sample proportion falls within this interval then the CI (estimate +-0.03) will include the true value. Therefore we expect the probability to be close to 0.95. Let X denote the number of adults answering that two children is ideal. The natural model for X is B(1006,0.49). Formally, this is an approximation that works well only if the sample size is less than 1/20 of the finite population size but that should be no problem here. The proportion of "positive" answers is p_hat=X/n. It does not follow a binomial distribution, but events expressed in terms of p_hat can be converted to events for X, and the probabilities can therefore be calculated from its binomial distribution. The probabilities may then be calculated directly from the binomial, or by the normal approximation formulae (5L-7). Alternatively, one may approximate the distribution of p_hat directly by a normal distribution; this effectively corresponds to using the normal approximation of the binomial distribution without continuity correction. A. Exact calculation from binomial distribution using Minitab: P(0.46 <= p_hat <= 0.52) = P(0.46*1006 <= X <= 0.52*1006) = P(463<=X<=523) = P(X<=523) - P(X<=462) = 0.973026-0.027377 = 0.94565 MTB > CDF 523; SUBC> Binomial 1006 .49. Cumulative Distribution Function Binomial with n = 1006 and p = 0.49 x P( X <= x ) 523 0.973026 MTB > CDF 462; SUBC> Binomial 1006 .49. Cumulative Distribution Function Binomial with n = 1006 and p = 0.49 x P( X <= x ) 462 0.0273770 B. Calculation of binomial probability by normal approximation (5L-7): P(463<=X<=523) ~= P(Z<=(523+0.5-1006*.49)/sqrt(1006*.49*.51)) - P(Z<=(463-0.5-1006*.49)/sqrt(1006*.49*.51)) = P(Z<=1.9274)-P(Z<=-1.9198) = 0.973035-0.0274416 = 0.9456 We should also check whether the rule of thumb for its use is satisfied (from 5L-7): 1006*0.49*0.51 = 251.4 >> 10 We should definitely be ok to use the normal approximation. MTB > CDF 1.9274; SUBC> Normal 0.0 1.0. Cumulative Distribution Function Normal with mean = 0 and standard deviation = 1 x P( X <= x ) 1.9274 0.973035 MTB > CDF -1.9198; SUBC> Normal 0.0 1.0. Cumulative Distribution Function Normal with mean = 0 and standard deviation = 1 x P( X <= x ) -1.9198 0.0274416 C. Normal approximation for p_hat: E(p_hat) = 0.49 sd(p_hat) = sqrt(0.49*0.51/1006) = 0.015761 and therefore (also using Minitab): P(0.46<=p_hat<=0.52) = P((0.46-0.49)/0.015761< Z <(0.52-0.49)/0.015761) = P(-1.903431 CDF -1.903431; SUBC> Normal 0.0 1.0. Cumulative Distribution Function Normal with mean = 0 and standard deviation = 1 x P( X <= x ) -1.90343 0.0284922 All calculations give a probability for the sample proportion to be within +-3 of the true mean of a little less than 0.95. The primary reason that it is not exactly 0.95 is that the value 0.03 has been rounded off (down) from the value obtained from the formula (Chapter 8) margin of error = 1.96*sqrt(0.49*0.51/1006) = 0.0309.