Supplementary Exercises 7.91 and 7.93 of IPS7e ---------------------------------------------- Data: Voice onset time (COT) for 6-year-old children and adults when pronouncing the work "bees". The data include 10 children and 20 adults. Model: the 2 samples are independent and each a simple random sample (i.i.d. sample) from a distribution with unknown mean and standard devation (mu1 and sigma1 for the children, mu2 and sigma2 for the adults). Estimates: children adults n 10 20 sample mean -3.67 -23.17 sample s 33.89 50.74 standard error 10.72 11.35 , computed as s/sqrt(n) 7.91: ----- (a) The SEs for children and adults are given above. All calculations, including also the standard error for the difference between VOT means for children and adults, will in this exercise be computed without any assumptions on the variances (i.e., without assuming equal variances); for discussion hereof see 7.93. SE(mean diff.) = sqrt(s1^2/n1 + s2^2/n2) = sqrt(10.72^2 + 11.35^2) = sqrt(243.58) = 15.61 (b) The null hypothesis is H0: mu1=mu2. There is nothing in the statement of the context of the data to suggest a one-sided alternative, so we use Ha: mu1<>mu2. The calculation of the test statistic is done using Minitab. MTB > TwoT 10 -3.67 33.89 20 -23.17 50.74; SUBC> Confidence 95.0; SUBC> Test 0.0; SUBC> Alternative 0. Two-Sample T-Test and CI SE Sample N Mean StDev Mean 1 10 -3.7 33.9 11 2 20 -23.2 50.7 11 Difference = mu (1) - mu (2) Estimate for difference: 19.5 95% CI for difference: (-12.6, 51.6) T-Test of difference = 0 (vs not =): T-Value = 1.25 P-Value = 0.223 DF = 25 Comments: --------- The P-value from a t-distribution with 25 df is 0.223, and thus totally non-significant. There is no evidence to indicate that the VOT means for children and adults differ. (c) The Minitab listing also includes a 95% confidence interval for the mean difference: mu1-mu2: (-12.6,51.6) As stated in the question, we would know from the non-significant P-value from (b) that the CI contains 0. This is because a test based on the confidence interval would be significant if 0 was not included in the interval, and we know the test should be non-significant. 7.93: ----- Because the pooled variance t-test (based on assuming equal variances) is not part of the VHM 801 course syllabus, we confine ourselves to showing the Minitab listing and give interpretations. MTB > TwoT 10 -3.67 33.89 20 -23.17 50.74; SUBC> Confidence 95.0; SUBC> Test 0.0; SUBC> Alternative 0; SUBC> Pooled. Two-Sample T-Test and CI SE Sample N Mean StDev Mean 1 10 -3.7 33.9 11 2 20 -23.2 50.7 11 Difference = mu (1) - mu (2) Estimate for difference: 19.5 95% CI for difference: (-17.0, 56.0) T-Test of difference = 0 (vs not =): T-Value = 1.09 P-Value = 0.283 DF = 28 Both use Pooled StDev = 46.0020 Comments: --------- The listing gives the pooled standard deviation as s=46.002, between the two sample standard deviations and clearly closest to the standard deviation for the largest sample (the largest sample has highest weight). From this value we can (re)compute the SE for the mean difference: SE(mean diff.) = s*sqrt(1/n1 + 2/n2) = 46.002*sqrt(1/10 +1/20) = 17.82 This value differs a bit from the 15.61 we computed in 7.91. This is because the largest sample standard deviation came from the largest sample, and therefore it has less impact without pooling the standard deviation (it would be the opposite effect if the the largest s came from the smallest sample). Because of the larger SE, the test statistic drops down to 1.09, but this does not change the conclusion substantially. The pooled df was 28 (=10+20-2) and thus a bit larger than the approximate df used in Exercise 7.91, but the biggest difference between the two procedures is the estimated standard error. It seems intuitively clear that the best estimate for the SE comes from the unpooled procedure.