Supplementary Exercises 6.38 and 6.39 of IPS7e ---------------------------------------------- Planning of a study on starting salaries for liberal arts major graduates. The parameter of interest is the population mean, i.e. the mean salary in the population of recent graduates. The problem expects us to take the sigma=9000 estimated in a pilot study as a known population value. Although difficult to justify will first work under this assumption, and then discuss dealing with unknown sigma in the population in the note below. 6.38: ----- For known sigma, we use the direct formula for sample size computed from a desired margin of error (m=400), e.g. 8L-13: n >= (zstar*sigma/m)^2 = (1.96*9000/400)^2 = 1944.8 ~= 1945. We need at least 1945 subjects to achieve that. That seems of course quite infeasible, but it results from a totally unrealistic expection for the margin of error ($400) when the population standard deviation is much higher ($9000). If we use the approximate value zstar~=2 instead of 1.96 as in the additional notes, the result becomes: n >= (2*9000/400)^2 = 2025. 6.39: ----- If we allow for a larger margin of error, the required sample size must drop down (because a larger sample size implies a smaller SE and margin of error). We redo the calculation with m=800: n >= (zstar*sigma/m)^2 = (1.96*9000/800)^2 = 486.02 ~= 487. Because the sample size (n) enters into the SE as the square-root of n, increasing the margin of error by a factor of 2 (that is to half of its original size) lowers the required sample size by a factor of 2^2=4. Added note (based on material in Chapter 7 and lecture 8) ---------- The above formula requires sigma to be known. In practice this is unrealistic if the value for sigma is based on a pilot study only. Therefore we would instead want to estimate sigma from the data when the actual analysis is carried out. Note that we still need a (guessed) value for sigma for the sample size calculation, but this value will not be assumed to be the known standard deviation when the data have been collected. The only change in the formula is that zstar should be replaced by tstar from a t-distribution with df=n-1. With n=1944, tstar=1.96=zstar, so the calculation will be essentially the same (zstar and tstar are entered into the formula with two decimals). Also for df=486, the difference from tstar=1.965 to 1.960 from N(0,1) is so small that one would usually ignore it. Specifically we can redo the calculation above with the value 1.965 instead of 1.96, n >= (tstar*sigma/1.5)^2 = (1.965*9000/800)^2 = 488.7 ~= 489. --- Minitab commands and listing: MTB > SSCI; SUBC> NMean; SUBC> Sigma 9000 1; SUBC> Confidence 95.0; SUBC> IType 0; SUBC> MError 400. Sample Size for Estimation Method Parameter Mean Distribution Normal Standard deviation 9000 (population value) Confidence level 95% Confidence interval Two-sided Results Margin Sample of Error Size 400 1945 MTB > SSCI; SUBC> NMean; SUBC> Sigma 9000 1; SUBC> Confidence 95.0; SUBC> IType 0; SUBC> MError 800. Sample Size for Estimation Method Parameter Mean Distribution Normal Standard deviation 9000 (population value) Confidence level 95% Confidence interval Two-sided Results Margin Sample of Error Size 800 487 Extra Minitab listings for analysis without assuming standard deviation known: ------------------------------------------------------------------------------ MTB > SSCI; SUBC> NMean; SUBC> Sigma 9000; SUBC> Confidence 95.0; SUBC> IType 0; SUBC> MError 400. Sample Size for Estimation Method Parameter Mean Distribution Normal Standard deviation 9000 (estimate) Confidence level 95% Confidence interval Two-sided Results Margin Sample of Error Size 400 1948 MTB > SSCI; SUBC> NMean; SUBC> Sigma 9000; SUBC> Confidence 95.0; SUBC> IType 0; SUBC> MError 800. Sample Size for Estimation Method Parameter Mean Distribution Normal Standard deviation 9000 (estimate) Confidence level 95% Confidence interval Two-sided Results Margin Sample of Error Size 800 489 MTB > InvCDF .025; SUBC> T 486. Inverse Cumulative Distribution Function Student's t distribution with 486 DF P( X <= x ) x 0.025 -1.96486