Supplementary Exercises 8.84 and 8.85 of IPS7e ---------------------------------------------- n=100 employees asked whether work stress has a negative impact on personal life. X=number of those employees answering yes; we observe X=68. Assume X to follow B(100,p). (binomial setting) We first compute the estimates: sample proportion: p_hat = X/n = 0.68, standard error of p_hat: sqrt(p_hat*(1-p_hat)/n) = 0.04665. 8.84: ----- The condition for use of the classical (normal distribution approximation) method for the confidence interval is clearly satisfied because there the data contain more than 15 positives and 15 negatives. For illustration, the other methods for computing a confidence interval are included as well, although in this situation the classical interval is acceptable. classical approximate 95% CI for p: p_hat +- 1.96*0.04665 0.680 +- 0.0914 = (0.589,0.771) plus four approximate method: p_tilde = (X+2)/(n+4) = 70/104 = 0.673, SE(p_tilde) = sqrt(p_tilde*(1-p_tilde)/(n+4)) = 0.045998 95% CI for p: p_tilde +- 1.96*0.045998 = 0.673 +- 0.0902 = (0.583,0.763) "exact" 95% CI for p: (0.579,0.770), from Minitab The CIs are similar but not exactly the same. The plus four CI is not symmetrical around the estimate (it's symmetrical around p_tilde). The "exact" interval is seen to be wider than the other two, by approximately 0.01, reflecting that it is conservative. Minitab commands for the confidence intervals: --- POne 100 68; Confidence 95.0; Alternative 0; UseZ. Sample X N Sample p 95% CI 1 68 100 0.680000 (0.588572, 0.771428) Using the normal approximation. POne 104 70; Confidence 95.0; Alternative 0; UseZ. Sample X N Sample p 95% CI 1 70 104 0.673077 (0.582923, 0.763231) Using the normal approximation. MTB > POne 100 68; SUBC> Confidence 95.0; SUBC> Alternative 0. Sample X N Sample p 95% CI 1 68 100 0.680000 (0.579233, 0.769780) --- 8.85: ----- The national survey had 75% of respondents answering yes. Because that survey was large it may be acceptable to assume there is no error associated with that estimate, and hence treat it as a fixed value. This means we will be testing the hypotheses H0: p=0.75 and Ha: p<>0.75 based on the single sample. If data from the large survey were available, it would have been appropriate to use methods for two independent samples (proportions). The conditions for use of the classical z-test are met here because 100*0.75=75 >= 10, and 100*(1-0.75)=25 >= 10. The calculation goes as follows: z = (0.68-0.75) / sqrt(0.75*0.25/100) = -1.62 P = 2*P(z>1.62) = 0.106. We therefore conclude that there is not sufficient evidence to reject the null hypothesis, and the proportion of stressed employees at the restaurant could very well match the nationwide level. For illustration purposes, we also compute P-values for an exact test based on the binomial distribution. This method is generally preferable and truly exact, but unless the tested proportion equals 0.5 there is no uniform rule for how to compute two-sided P-values. The simplest rule is to double the one-sided P-value: P = 2*P(X<=68) = 2*0.0693 = 0.139. Minitab computes the 2-sided P-value by adding probabilities for outcomes equally far from the hypothesized value on both sides: P = P(X<=68) + P(X>=82) = 0.0693+0.0630 = 0.132. Stata and R compute the 2-sided P-value by adding probabilities not larger than that of the observed count on both sides: P = P(X<=68) + P(X>=83) = 0.0693+0.0376 = 0.107. It is seen that among the exact P-values, the method used in Stata and R (which is considered more accurate) agrees well with the z-test, due to the fairly large sample size. Minitab commands for the tests: --- POne 100 68; Test .75; Confidence 95.0; Alternative 0; UseZ. Test of p = 0.75 vs p not = 0.75 Sample X N Sample p 95% CI Z-Value P-Value 1 68 100 0.680000 (0.588572, 0.771428) -1.62 0.106 Using the normal approximation. POne 100 68; Test .75; Confidence 95.0; Alternative 0. Test of p = 0.75 vs p not = 0.75 Exact Sample X N Sample p 95% CI P-Value 1 68 100 0.680000 (0.579233, 0.769780) 0.132 ---