Solution file for Problems 3.2, 4.2 and 5.1 (GO) ------------------------------------------------ Data: measurements of longevity (in days) of male fruit flies subjected to 5 different reproduction conditions. A total of 125 male fruit flies were randomly distributed onto 5 groups that differed in the exposure to females. Group 1 had no exposure to females; Groups 2 and 4 were daily exposed to 1 and 8 pregnant (therefore unreceptive) females, respectively. Groups 3 and 5 were daily exposed to 1 and 8 virgin (therefore receptive) females, respectively. The data constitute 5 independent samples with continuous outcome (although recorded in days, but this discretisation should not be serious because the data contain a wide range of days), and the model immediately suggested is a one-way ANOVA. The experiment constitute a completely randomized design with 5 groups. Problem 3.2: ------------ We compute per group descriptive summaries and run the one-way ANOVA analysis, including checks of the assumptions of normality and same standard deviations in the groups. MTB > WOpen "h:\VHM\VHM802\Data_csv\ch03pr2.csv"; SUBC> FType; SUBC> CSV; SUBC> DecSep; SUBC> Period; SUBC> Field; SUBC> Comma; SUBC> TDelimiter; SUBC> DoubleQuote. Retrieving worksheet from file: 'h:\VHM\VHM802\Data_csv\ch03pr2.csv' Worksheet was saved on 11/02/2011 MTB > GSummary 'longev'; SUBC> By 'group'. Results for group = 1 Summary for longev (group = 1) Results for group = 2 Summary for longev (group = 2) Results for group = 3 Summary for longev (group = 3) Results for group = 4 Summary for longev (group = 4) Results for group = 5 Summary for longev (group = 5) MTB > Oneway 'longev' 'group'; SUBC> GBoxplot; SUBC> GFourpack. One-way ANOVA: longev versus group Source DF SS MS F P group 4 11939 2985 13.61 0.000 Error 120 26314 219 Total 124 38253 S = 14.81 R-Sq = 31.21% R-Sq(adj) = 28.92% Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev -------+---------+---------+---------+-- 1 25 63.36 14.54 (-----*-----) 2 25 63.56 16.45 (-----*----) 3 25 64.80 15.65 (-----*-----) 4 25 56.76 14.93 (-----*-----) 5 25 38.72 12.10 (-----*-----) -------+---------+---------+---------+-- 40 50 60 70 Pooled StDev = 14.81 Boxplot of longev Residual Plots for longev Comments: --------- We first note that no problems with the model assumptions could be found. The within-group distributions look fairly normal (and all normality tests are non-significant). The residual plots looks very nice for a dataset of this size, and the standard deviations are quite close. The ANOVA table shows a strongly significant difference between groups, despite a fairly low R^2 value. The estimated means and the graphical representation of confidence intervals suggest that group 5 differs significantly from all other groups which seem pretty close. The non-overlapping confidence intervals with group 5 shows that t-tests unadjusted for multiple comparisons would all be significant when comparing group 5 to the other groups. The almost totally non-overlapping intervals of groups 1-3 (where estimates are inside the other intervals) shows that there is no significant difference between these groups. Problem 4.2: ------------ A set of orthogonal contrasts can be set up in many ways, the following seemed the most natural to me based on the description of the groups. For comparison, the strongest contrast in the data, between group 5 and the others, is also included although it is not part of the orthogonal set. Contrast Interpretation Coefficients -------------------------------------------------- company contact to females 4 -1 -1 -1 -1 receptive pregnant vs virgins 0 1 -1 1 -1 # pregnant 1 vs 8 pregnant 0 1 0 -1 0 # virgins 1 vs 8 virgins 0 0 1 0 -1 group 5 group 5 vs others -1 -1 -1 -1 4 The first four contrasts are pairwise orthogonal, because for any pair the sum of products of coefficients is zero. For example, for contrasts 1 and 2: 4*0 + (-1)*1) + (-1)*(-1) + (-1)*1 + (-1)*(-1) = 0 or for contrasts 2 and 3: 0*0 + 1*1 + (-1)*0 + 1*(-1) + (-1)*0 = 0. Contrast Estimate SE SS SS(%) t P(t) F(Schef) P(Schef) ------------------------------------------------------------------------------- company 29.6 13.2 1095.2 9.2 2.23 0.027 1.249 0.294 receptive 16.8 5.92 1764.0 14.8 2.84 0.005 2.011 0.097 # pregnant 6.8 4.19 578.0 4.8 1.62 0.107 0.659 0.621 # virgins 26.1 4.19 8502.1 71.2 6.23 0.000 9.693 0.000 group 5 -93.6 13.2 10951.2 91.7 -7.07 0.000 12.49 0.000 ------------------------------------------------------------------------------- formulae: SS=(estimate^2)/[(w_1^2+...+w_5^2)/25] SS (%) = SS / SSTrT (SSTrT=11939.28) t=Est/SE=sqrt(SS/MSE) (MSE=219.28) P(t) ~ t(120) F(Scheffe)=SS/4/MSE (or t^2/4) P(Scheffe) ~ F(4,120) If the contrasts were pre-planned, the P-values from the t-test could be used without adjustment (unless there was concern about carrying out 4 tests). This would make 3 out of the first 4 contrasts significant. However, it's clear from looking at the results that these contrasts don't represent the real effect well. The last contrast, developed by inspecting the data (means), reflects the pattern in the data, and is even with the Scheffe test strongly significant. This is probably the best contrast representation of the difference between groups. Problem 5.1: ------------ We rerun the model as a General Linear Model to get easy access to Bonferroni corrected multiple comparisons. MTB > GLM 'longev' = group; SUBC> Brief 2 ; SUBC> Pairwise group; SUBC> Bonferroni; SUBC> NoCI. General Linear Model: longev versus group Factor Type Levels Values group fixed 5 1, 2, 3, 4, 5 Analysis of Variance for longev, using Adjusted SS for Tests Source DF Seq SS Adj SS Adj MS F P group 4 11939.3 11939.3 2984.8 13.61 0.000 Error 120 26313.5 26313.5 219.3 Total 124 38252.8 S = 14.8081 R-Sq = 31.21% R-Sq(adj) = 28.92% Unusual Observations for longev Obs longev Fit SE Fit Residual St Resid 43 96.0000 63.5600 2.9616 32.4400 2.24 R 49 96.0000 63.5600 2.9616 32.4400 2.24 R 66 97.0000 64.8000 2.9616 32.2000 2.22 R 76 21.0000 56.7600 2.9616 -35.7600 -2.46 R R denotes an observation with a large standardized residual. Grouping Information Using Bonferroni Method and 95.0% Confidence group N Mean Grouping 3 25 64.8 A 2 25 63.6 A 1 25 63.4 A 4 25 56.8 A 5 25 38.7 B Means that do not share a letter are significantly different. Bonferroni Simultaneous Tests Response Variable longev All Pairwise Comparisons among Levels of group group = 1 subtracted from: Difference SE of Adjusted group of Means Difference T-Value P-Value 2 0.20 4.188 0.048 1.0000 3 1.44 4.188 0.344 1.0000 4 -6.60 4.188 -1.576 1.0000 5 -24.64 4.188 -5.883 0.0000 group = 2 subtracted from: Difference SE of Adjusted group of Means Difference T-Value P-Value 3 1.24 4.188 0.296 1.0000 4 -6.80 4.188 -1.624 1.0000 5 -24.84 4.188 -5.931 0.0000 group = 3 subtracted from: Difference SE of Adjusted group of Means Difference T-Value P-Value 4 -8.04 4.188 -1.920 0.5728 5 -26.08 4.188 -6.227 0.0000 group = 4 subtracted from: Difference SE of Adjusted group of Means Difference T-Value P-Value 5 -18.04 4.188 -4.307 0.0003 Comments for Bonferroni method: ------------------------------- The results from the multiple comparisons are very clear: group 5 differs significantly from all other groups, the differences among which on the other hand are nowhere near significant. Thus the letter coding (corresponding to underlining) would be 5b 4a 1a 2a 3a Holm method: ------------ We start by rearranging the list above in increasing order of P-values (or equivalently from more to less extreme t-values): Difference SE of Adjusted group of Means Difference T-Value P-Value 5 - 3 -26.08 4.188 -6.227 0.0000 5 - 2 -24.84 4.188 -5.931 0.0000 5 - 1 -24.64 4.188 -5.883 0.0000 5 - 4 -18.04 4.188 -4.307 0.0003 4 - 3 -8.04 4.188 -1.920 0.5728 4 - 2 -6.80 4.188 -1.624 1.0000 4 - 1 -6.60 4.188 -1.576 1.0000 3 - 1 1.44 4.188 0.344 1.0000 3 - 2 1.24 4.188 0.296 1.0000 2 - 1 0.20 4.188 0.048 1.0000 With a total of 10 comparisons, the Bonferroni-adjusted P-values above have all been multiplied by 10. The Holm method will retrieve the original P-values and then multiply those by 10,9,8,...,1 from top to bottom row. The first four rows will still have significant P-values with the Holm adjustment because the P-values will be smaller (except for the first row where the P-value is the same) than those listed above, and they were already significant with the Bonferroni method. For row five we get the Holm adjusted P-value as (0.5728/10)*6=0.34. This means that the comparison in row 5 is non-significant and hence all subsequent rows are non-significant as well by the Holm method. The significant comparisons are therefore the same as for the Bonferroni method.