Solution file for additional exercise 7.4 ----------------------------------------- Data: survival times of goldfish exposed different concentrations of cyanide and oxygen, at different temperatures. Notation: y_i = survival times (averaged over 10 fish) for solution i, i=1,...,45, or y_ijk = survival times (averaged over 10 fish) for solution with oxygen level j and cyanide level k, at temperature i, i=5,15,25; j=0.5,3,9; k=0.16,0.8,4.20,100. The design is a complete 3-factorial without replication (because the replications of goldfish are summarized into a single, averaged response). In absence of replication and with no factor seemingly less relevant for interactions, the most natural model takes the 3-factor interaction as the error term. This gives the statistical model y_i = mu + alpha_temp(i) + beta_oxygen(i) + gamma_cyanide(i) + (alpha beta)_temp*oxygen(i) + (alpha gamma)_temp*cyanide(i) + (beta gamma)_oxygen*cyanide(i) + eps_i, or y_ijk = mu + alpha_i + beta_j + gamma_k + (alpha beta)_ij + (alpha gamma)_ik + (beta gamma)_jk + eps_ijk, depending on the chosen notation. MTB > WOpen "h:\VHM\VHM802\Data_csv\hs07_4.csv"; SUBC> FType; SUBC> CSV; SUBC> DecSep; SUBC> Period; SUBC> Field; SUBC> Comma; SUBC> TDelimiter; SUBC> DoubleQuote. Retrieving worksheet from file: 'h:\VHM\VHM802\Data_csv\hs07_4.csv' Worksheet was saved on 17/02/2011 MTB > Name c5 "SRES1" c6 "TRES1" MTB > GLM 'surv' = temp| oxygen| cyanide - temp* oxygen* cyanide; SUBC> Brief 2 ; SUBC> Means temp oxygen cyanide; SUBC> SResiduals 'SRES1'; SUBC> TResiduals 'TRES1'; SUBC> GFourpack; SUBC> RType 2 . General Linear Model: surv versus temp, oxygen, cyanide Factor Type Levels Values temp fixed 3 5, 15, 25 oxygen fixed 3 1.5, 3.0, 9.0 cyanide fixed 5 0.16, 0.80, 4.00, 20.00, 100.00 Analysis of Variance for surv, using Adjusted SS for Tests Source DF Seq SS Adj SS Adj MS F P temp 2 571.161 571.161 285.581 441.51 0.000 oxygen 2 37.588 37.588 18.794 29.06 0.000 cyanide 4 555.455 555.455 138.864 214.68 0.000 temp*oxygen 4 0.971 0.971 0.243 0.38 0.823 temp*cyanide 8 36.859 36.859 4.607 7.12 0.000 oxygen*cyanide 8 52.645 52.645 6.581 10.17 0.000 Error 16 10.349 10.349 0.647 Total 44 1265.028 S = 0.804259 R-Sq = 99.18% R-Sq(adj) = 97.75% Unusual Observations for surv Obs surv Fit SE Fit Residual St Resid 21 15.8000 16.8600 0.6456 -1.0600 -2.21 R 26 20.7000 19.5200 0.6456 1.1800 2.46 R 36 12.9000 11.8933 0.6456 1.0067 2.10 R 41 14.2000 15.2933 0.6456 -1.0933 -2.28 R R denotes an observation with a large standardized residual. Least Squares Means for surv temp Mean SE Mean 5 15.407 0.2077 15 10.400 0.2077 25 6.713 0.2077 oxygen 1.5 9.647 0.2077 3.0 11.007 0.2077 9.0 11.867 0.2077 cyanide 0.16 17.300 0.2681 0.80 11.400 0.2681 4.00 9.411 0.2681 20.00 9.044 0.2681 100.00 7.044 0.2681 Residual Plots for surv MTB > GLM 'surv' = temp| oxygen| cyanide - temp* oxygen* cyanide; SUBC> SMeans C4000; SUBC> Brief 0; SUBC> Interact 'temp' 'oxygen' 'cyanide'. MTB > GFInt 'temp' 'oxygen' 'cyanide'; SUBC> Responses 'surv'; SUBC> FMeans C4000; SUBC> Full. Interaction Plot (fitted means) for surv MTB > Erase C4000. MTB > NormTest 'SRES1'. Probability Plot of SRES1 The P-value for the Anderson-Darling test of normality is 0.014 Comments: --------- The residual plot does not look good: there is a clear, narrower shape for residuals corresponding to low fitted values, a horn shape to the right. Nor does the normality of the residuals look good, and the normality test gives a P-value below 0.05. Despite the fact that the observations are already transformed, a further transformation is suggested. We try below a Box-Cox analysis to determine a suitable transformation. It can also be noted that the ANOVA table shows two strongly significant interactions; the only non-significant term is temp*oxygen. No single residual is alarmingly large. A Box-Cox analysis to get guidance on a useful transformation: MTB > GReg 'surv' = temp |oxygen| cyanide- temp* oxygen* cyanide; SUBC> Categorical 'temp' 'oxygen' 'cyanide'; SUBC> Constant; SUBC> Confidence 95.0; SUBC> Tolerance 1.0E-12; SUBC> Coding -1; SUBC> Boxcox; SUBC> TEquation; SUBC> TCoef; SUBC> TSummary; SUBC> TANOVA; SUBC> TDiag 0. General Regression Analysis: surv versus temp, oxygen, cyanide Box-Cox transformation of the response with estimated lambda = 0.762848 The 95% CI for lambda is (0.344348, 1.12135) Rounded lambda = 1 used in the regression analysis Comments: --------- The surprising result is that the Box-Cox analysis does not strongly suggest a transformation. The optimal value of lambda seems to be around 0.75 (more precisely 0.76), but CIs are wide and include 1 in both Minitab and Stata (and the likelihood-ratio test for lambda=1 in Stata gives P=0.206). One possible explanation of the apparent discrepancy to our residual analysis, is that the residuals here are quite strongly correlated because they correspond to the 3-factor interaction effects. There are several possible routes to continue the analysis. One is to stick with the untransformed survival values. Another is to use the optimal value of lambda (0.75), and the third one which we will follow here is to square-root transform the survivals. This is hardly worse than using the untransformed data (0.5 is about as far from the optimal lambda as 1); however, as will be shown below, the residuals look markedly better and the ANOVA table allows for further model simplification. MTB > Name C7 'rootsurv' MTB > Let 'rootsurv' = sqrt(surv) MTB > Name c8 "SRES2" c9 "TRES2" MTB > GLM 'rootsurv' = temp| oxygen| cyanide - temp* oxygen* cyanide; SUBC> Brief 2 ; SUBC> Means temp oxygen cyanide; SUBC> SResiduals 'SRES2'; SUBC> TResiduals 'TRES2'; SUBC> GFourpack; SUBC> RType 2 . General Linear Model: rootsurv versus temp, oxygen, cyanide Factor Type Levels Values temp fixed 3 5, 15, 25 oxygen fixed 3 1.5, 3.0, 9.0 cyanide fixed 5 0.16, 0.80, 4.00, 20.00, 100.00 Analysis of Variance for rootsurv, using Adjusted SS for Tests Source DF Seq SS Adj SS Adj MS F P temp 2 13.54216 13.54216 6.77108 404.53 0.000 oxygen 2 0.77829 0.77829 0.38915 23.25 0.000 cyanide 4 11.31549 11.31549 2.82887 169.01 0.000 temp*oxygen 4 0.08553 0.08553 0.02138 1.28 0.320 temp*cyanide 8 0.17842 0.17842 0.02230 1.33 0.297 oxygen*cyanide 8 0.82769 0.82769 0.10346 6.18 0.001 Error 16 0.26781 0.26781 0.01674 Total 44 26.99539 S = 0.129377 R-Sq = 99.01% R-Sq(adj) = 97.27% Unusual Observations for rootsurv Obs rootsurv Fit SE Fit Residual St Resid 26 4.54973 4.38724 0.10386 0.16248 2.11 R 36 3.59166 3.42637 0.10386 0.16529 2.14 R 37 2.32379 2.49678 0.10386 -0.17299 -2.24 R R denotes an observation with a large standardized residual. Least Squares Means for rootsurv temp Mean SE Mean 5 3.880 0.03340 15 3.183 0.03340 25 2.537 0.03340 oxygen 1.5 3.027 0.03340 3.0 3.226 0.03340 9.0 3.346 0.03340 cyanide 0.16 4.096 0.04313 0.80 3.323 0.04313 4.00 3.012 0.04313 20.00 2.956 0.04313 100.00 2.614 0.04313 oxygen*cyanide 1.5 0.16 3.605 0.07470 1.5 0.80 3.203 0.07470 1.5 4.00 2.929 0.07470 1.5 20.00 2.949 0.07470 1.5 100.00 2.451 0.07470 3.0 0.16 4.175 0.07470 3.0 0.80 3.235 0.07470 3.0 4.00 3.040 0.07470 3.0 20.00 2.995 0.07470 3.0 100.00 2.686 0.07470 9.0 0.16 4.508 0.07470 9.0 0.80 3.531 0.07470 9.0 4.00 3.065 0.07470 9.0 20.00 2.924 0.07470 9.0 100.00 2.703 0.07470 Residual Plots for rootsurv MTB > GLM 'rootsurv' = temp| oxygen| cyanide - temp* oxygen* cyanide; SUBC> SMeans C4000; SUBC> Brief 0; SUBC> Interact 'temp' 'oxygen' 'cyanide'. MTB > GFMain 'temp' 'oxygen' 'cyanide'; SUBC> Responses 'rootsurv'; SUBC> FMeans C4000. Main Effects Plot (fitted means) for rootsurv MTB > GFInt 'temp' 'oxygen' 'cyanide'; SUBC> Responses 'rootsurv'; SUBC> FMeans C4000; SUBC> Full. Interaction Plot (fitted means) for rootsurv MTB > Erase C4000. MTB > NormTest 'SRES2'. Probability Plot of SRES2 The P-value of the Anderson-Darling test of normality is 0.477. Comments: --------- The residuals for the square-root transformed data look better: less/no indication of horn shape, and the points in the normal plot much closer to a straight line (and a P-value>>0.10). For this exercise we will therefore analyse the square-root transformed values. As the original values were already log-transformed (perhaps before averaging, this is not clear from the text), it might however be that another transformation should have been applied to those original values. The ANOVA table for the square-root transformed values allows for more simplification: the interaction temp*cyanide is clearly nonsignificant, only the interaction oxygen*cyanide remains significant at the 5% level. Therefore, in addition to improved compliance with model assumptions, we also get a simpler description of the data. The factor and interactions plots are visually very informative. The temperature plot seems almost perfectly linear, so it is naturally suggested to examine this further by fitting temp as a covariate (continuous predictor). Before doing that, we drop the non-significant interactions (in order to avoid nuisance effects from assuming linearity in these interactions). MTB > GLM 'rootsurv' = temp oxygen| cyanide; SUBC> Brief 2 . General Linear Model: rootsurv versus temp, oxygen, cyanide Factor Type Levels Values temp fixed 3 5, 15, 25 oxygen fixed 3 1.5, 3.0, 9.0 cyanide fixed 5 0.16, 0.80, 4.00, 20.00, 100.00 Analysis of Variance for rootsurv, using Adjusted SS for Tests Source DF Seq SS Adj SS Adj MS F P temp 2 13.5422 13.5422 6.7711 356.53 0.000 oxygen 2 0.7783 0.7783 0.3891 20.49 0.000 cyanide 4 11.3155 11.3155 2.8289 148.96 0.000 oxygen*cyanide 8 0.8277 0.8277 0.1035 5.45 0.000 Error 28 0.5318 0.5318 0.0190 Total 44 26.9954 S = 0.137809 R-Sq = 98.03% R-Sq(adj) = 96.90% Unusual Observations for rootsurv Obs rootsurv Fit SE Fit Residual St Resid 15 3.14643 3.38338 0.08470 -0.23696 -2.18 R 37 2.32379 2.57165 0.08470 -0.24786 -2.28 R 45 2.28035 2.03996 0.08470 0.24039 2.21 R R denotes an observation with a large standardized residual. MTB > GLM 'rootsurv' = temp oxygen| cyanide; SUBC> Covariates 'temp'; SUBC> Brief 2 . General Linear Model: rootsurv versus oxygen, cyanide Factor Type Levels Values oxygen fixed 3 1.5, 3.0, 9.0 cyanide fixed 5 0.16, 0.80, 4.00, 20.00, 100.00 Analysis of Variance for rootsurv, using Adjusted SS for Tests Source DF Seq SS Adj SS Adj MS F P temp 1 13.5359 13.5359 13.5359 729.65 0.000 oxygen 2 0.7783 0.7783 0.3891 20.98 0.000 cyanide 4 11.3155 11.3155 2.8289 152.49 0.000 oxygen*cyanide 8 0.8277 0.8277 0.1035 5.58 0.000 Error 29 0.5380 0.5380 0.0186 Total 44 26.9954 S = 0.136203 R-Sq = 98.01% R-Sq(adj) = 96.98% Term Coef SE Coef T P Constant 4.20758 0.04247 99.08 0.000 temp -0.067171 0.002487 -27.01 0.000 Unusual Observations for rootsurv Obs rootsurv Fit SE Fit Residual St Resid 15 3.14643 3.37507 0.08247 -0.22864 -2.11 R 37 2.32379 2.56334 0.08247 -0.23955 -2.21 R 45 2.28035 2.03164 0.08247 0.24871 2.29 R R denotes an observation with a large standardized residual. Means for Covariates Covariate Mean StDev temp 15.00 8.257 Least Squares Means for rootsurv oxygen Mean SE Mean 1.5 3.027 0.03517 3.0 3.226 0.03517 9.0 3.346 0.03517 cyanide 0.16 4.096 0.04540 0.80 3.323 0.04540 4.00 3.012 0.04540 20.00 2.956 0.04540 100.00 2.614 0.04540 oxygen*cyanide 1.5 0.16 3.605 0.07864 1.5 0.80 3.203 0.07864 1.5 4.00 2.929 0.07864 1.5 20.00 2.949 0.07864 1.5 100.00 2.451 0.07864 3.0 0.16 4.175 0.07864 3.0 0.80 3.235 0.07864 3.0 4.00 3.040 0.07864 3.0 20.00 2.995 0.07864 3.0 100.00 2.686 0.07864 9.0 0.16 4.508 0.07864 9.0 0.80 3.531 0.07864 9.0 4.00 3.065 0.07864 9.0 20.00 2.924 0.07864 9.0 100.00 2.703 0.07864 MTB > GLM 'rootsurv' = temp oxygen| cyanide; SUBC> Covariates 'temp'; SUBC> SMeans C4000; SUBC> Brief 0; SUBC> Interact 'oxygen' 'cyanide'. MTB > GFInt 'oxygen' 'cyanide'; SUBC> Responses 'rootsurv'; SUBC> FMeans C4000; SUBC> Full. Interaction Plot (fitted means) for rootsurv MTB > Erase C4000. Comments: --------- The F-test for linearity of the effect of temperature is F = [0.5380-0.5318]/[29-28] / 0.0190 = 0.33, which is far from significant in F(1,28). The estimated regression coefficient shows the rootsurv values to decrease 0.067 units per temperature increase. As the next step, we examine whether the effect of oxygen could be modelled as linear as well, as might seem plausible from the interaction plot. However, as the following analysis shows, the effect of cyanide does not seem linear when the actual concentrations are taken into account. MTB > GLM 'rootsurv' = temp oxygen| cyanide; SUBC> Covariates 'temp' 'oxygen'; SUBC> Brief 2 . General Linear Model: rootsurv versus cyanide Factor Type Levels Values cyanide fixed 5 0.16, 0.80, 4.00, 20.00, 100.00 Analysis of Variance for rootsurv, using Adjusted SS for Tests Source DF Seq SS Adj SS Adj MS F P temp 1 13.5359 13.5359 13.5359 517.50 0.000 oxygen 1 0.6153 0.6153 0.6153 23.52 0.000 cyanide 4 11.3155 2.0450 0.5113 19.55 0.000 cyanide*oxygen 4 0.6394 0.6394 0.1598 6.11 0.001 Error 34 0.8893 0.8893 0.0262 Total 44 26.9954 S = 0.161729 R-Sq = 96.71% R-Sq(adj) = 95.74% Term Coef SE Coef T P Constant 4.04520 0.06053 66.83 0.000 temp -0.067171 0.002953 -22.75 0.000 oxygen 0.036086 0.007440 4.85 0.000 oxygen*cyanide 0.16 0.06573 0.01488 4.42 0.000 0.80 0.00929 0.01488 0.62 0.537 4.00 -0.02192 0.01488 -1.47 0.150 20.00 -0.04187 0.01488 -2.81 0.008 Unusual Observations for rootsurv Obs rootsurv Fit SE Fit Residual St Resid 6 4.95984 4.61517 0.06634 0.34467 2.34 R 31 2.81069 3.11902 0.07918 -0.30833 -2.19 R 36 3.59166 3.27174 0.06634 0.31992 2.17 R R denotes an observation with a large standardized residual. Comments: --------- The F-test for linearity of the effect of oxygen is F = [0.8893-0.5380]/[34-29] / 0.0186 = 3.78 which corresponds to P=0.06 in F(1,29). There is some evidence of non-linearity in the effect of oxygen, and we take this as a reason to not explore linear effects of oxygen further Thus, the best way to represent the interaction between oxygen and cyanide, as well as their main effects, is by the interaction plot, where each estimate has a standard error of 0.079. These estimates can be backtransformed (squaring them). Overall, higher oxygen is associated with increased survival, while - not surprisingly - survival decreases with increasing levels of cyanide. If cyanide as kept as categorical, the interaction cyanide*oxygen is best represented by the interaction plot, a table of means with standard errors and suitable pairwise comparisons among these means. Minitab offers Bonferroni-adjusted comparisons for all comparisons among the 3*5=15 means - very large listing and a very strong correction (because there are 15*14/2=105 comparisons). It is not clear that all these comparisons are of (same) interest. Sometimes one would want to limit the comparisons within an interaction to comparisons with one factor kept constant. In this case, this corresponds to: - oxygen constant: 3*(5*4/2) = 30 comparisons - cyanide constant: 5*(3*2/2) = 15 comparisons These 45 comparisons comparisons could be considered among all the comparisons in the (full) listing, but the Bonferroni adjustment must then be redone with 45 instead of 105 comparisons. That is, the P-values based on the full adjustment must be multiplied by 45/105.