Solution file for additional exercise 10.4 ------------------------------------------ (see Exercise 10.1 for discussion of model, design and notation) MTB > WOpen "H:\VHM\VHM802\Data_csv\hs10_4.csv"; SUBC> FType; SUBC> CSV; SUBC> DecSep; SUBC> Period; SUBC> Field; SUBC> Comma; SUBC> TDelimiter; SUBC> DoubleQuote. Retrieving worksheet from file: ‘H:\VHM\VHM802\Data_csv\hs10_4.csv’ Worksheet was saved on 27/02/2012 MTB > GLM; SUBC> Response 'pH'; SUBC> Nodefault; SUBC> Categorical 'strain' 'litter'; SUBC> Nest litter(strain); SUBC> Random litter; SUBC> Terms strain litter; SUBC> Means strain; SUBC> TExpand; SUBC> TMethod; SUBC> TAnova; SUBC> TSummary; SUBC> TCoefficients; SUBC> TEquation; SUBC> TFactor; SUBC> TEMS; SUBC> TVariance; SUBC> TMeans; SUBC> TDiagnostics 0; SUBC> Rtype 2; SUBC> GFOURPACK. General Linear Model: pH versus strain, litter Method Factor coding (-1, 0, +1) Factor Information Factor Type Levels Values strain Fixed 2 pHH, pHL litter(strain) Random 14 1(pHH), 2(pHH), 3(pHH), 4(pHH), 5(pHH), 6(pHH), 7(pHH), 1(pHL), 2(pHL), 3(pHL), 4(pHL), 5(pHL), 6(pHL), 7(pHL) Analysis of Variance Source DF Seq SS Contribution Adj SS Adj MS F-Value P-Value strain 1 0.006645 3.93% 0.006645 0.006645 1.26 0.283 litter(strain) 12 0.063093 37.31% 0.063093 0.005258 2.22 0.028 Error 42 0.099375 58.76% 0.099375 0.002366 Total 55 0.169113 100.00% Model Summary S R-sq R-sq(adj) PRESS R-sq(pred) 0.0486423 41.24% 23.05% 0.176667 0.00% Coefficients Term Coef SE Coef 95% CI T-Value P-Value VIF Constant 7.46625 0.00650 ( 7.45313, 7.47937) 1148.64 0.000 strain pHH 0.01089 0.00650 (-0.00222, 0.02401) 1.68 0.101 1.00 litter(strain) 1(pHH) -0.0296 0.0225 ( -0.0751, 0.0158) -1.32 0.195 * 2(pHH) -0.0021 0.0225 ( -0.0476, 0.0433) -0.10 0.925 * 3(pHH) 0.0554 0.0225 ( 0.0099, 0.1008) 2.46 0.018 * 4(pHH) -0.0346 0.0225 ( -0.0801, 0.0108) -1.54 0.131 * 5(pHH) -0.0221 0.0225 ( -0.0676, 0.0233) -0.98 0.331 * 6(pHH) 0.0029 0.0225 ( -0.0426, 0.0483) 0.13 0.900 * 1(pHL) -0.0279 0.0225 ( -0.0733, 0.0176) -1.24 0.223 * 2(pHL) -0.0579 0.0225 ( -0.1033, -0.0124) -2.57 0.014 * 3(pHL) -0.0179 0.0225 ( -0.0633, 0.0276) -0.79 0.432 * 4(pHL) 0.0271 0.0225 ( -0.0183, 0.0726) 1.21 0.235 * 5(pHL) 0.0346 0.0225 ( -0.0108, 0.0801) 1.54 0.131 * 6(pHL) -0.0104 0.0225 ( -0.0558, 0.0351) -0.46 0.648 * Regression Equation pH = 7.46625 + 0.01089 strain_pHH - 0.01089 strain_pHL - 0.0296 litter(strain)_1(pHH) - 0.0021 litter(strain)_2(pHH) + 0.0554 litter(strain)_3(pHH) - 0.0346 litter(strain)_4(pHH) - 0.0221 litter(strain)_5(pHH) + 0.0029 litter(strain)_6(pHH) + 0.0304 litter(strain)_7(pHH) - 0.0279 litter(strain)_1(pHL) - 0.0579 litter(strain)_2(pHL) - 0.0179 litter(strain)_3(pHL) + 0.0271 litter(strain)_4(pHL) + 0.0346 litter(strain)_5(pHL) - 0.0104 litter(strain)_6(pHL) + 0.0521 litter(strain)_7(pHL) Equation treats random terms as though they are fixed. Fits and Diagnostics for Unusual Observations Obs pH Fit SE Fit 95% CI Resid Std Resid Del Resid HI Cook’s D 2 7.3900 7.4750 0.0243 (7.4259, 7.5241) -0.0850 -2.02 -2.10 0.25 0.10 31 7.6300 7.5325 0.0243 (7.4834, 7.5816) 0.0975 2.31 2.45 0.25 0.13 46 7.5500 7.4425 0.0243 (7.3934, 7.4916) 0.1075 2.55 2.74 0.25 0.16 Obs DFITS 2 -1.21122 R 31 1.41350 R 46 1.58364 R R Large residual Expected Mean Squares, using Adjusted SS Expected Mean Square Source for Each Term 1 strain (3) + 4.0000 (2) + Q[1] 2 litter(strain) (3) + 4.0000 (2) 3 Error (3) Means Term Fitted Mean strain pHH 7.47714 pHL 7.45536 Variance Components, using Adjusted SS Source Variance % of Total StDev % of Total litter(strain) 0.0007229 23.40% 0.0268871 48.38% Error 0.0023661 76.60% 0.0486423 87.52% Total 0.0030890 0.0555787 Residual Plots for pH Comments and answers to questions: ---------------------------------- The residuals of the error terms look okay, and the most extreme residual has a corresponding deletion residual of 2.74, which is no cause of concern in a dataset of this size. We discuss the residuals of litter random effects below. The mean strain levels are given above as least square means, and the standard errors could be computed by the usual formulae except for using MS(Litt) instead of MSE: SE = sqrt(MS(Litt)/28) = 0.014 The estimated variance component are listed above as well: sigma^2 (error): 0.00237 sigma^2_B (litters): 0.00072 There appears to be much more variation among mice (within litters) than between litters. The ANOVA table shows weak effects of both litters and strains. In particular, the F-statistic for testing no strain effect is only 1.26, with a P-value of 0.28. Therefore, the breeding does not really seem to have been successful (at least, the pHH strain has a higher blood-pH mean than the pHL strain, but the message is that this could very well be caused by random fluctuations). Note that we have no interest in dropping the litter effects from the model, despite their non-significance, because the litters are a substantial part of the study design and data structure. Next we give commands to compute residuals at the litter level, or estimated random litter effects. Essentially, we aggregate the data within litters, and analyze the litter means. MTB > Name c5 "ByVar1" c6 "ByVar2" c7 "Mean1" MTB > Statistics 'pH'; SUBC> By 'strain' 'litter'; SUBC> GValues 'ByVar1'-'ByVar2'; SUBC> Mean 'Mean1'. MTB > GLM; SUBC> Response 'Mean1'; SUBC> Nodefault; SUBC> Categorical 'ByVar1'; SUBC> Terms ByVar1; SUBC> Means ByVar1; SUBC> TExpand; SUBC> TMethod; SUBC> TAnova; SUBC> TSummary; SUBC> TCoefficients; SUBC> TEquation; SUBC> TFactor; SUBC> TMeans; SUBC> TDiagnostics 0; SUBC> Rtype 2; SUBC> GFOURPACK. General Linear Model: Mean1 versus ByVar1 Factor Information Factor Type Levels Values ByVar1 Fixed 2 pHH, pHL Analysis of Variance Source DF Seq SS Contribution Adj SS Adj MS F-Value P-Value ByVar1 1 0.001661 9.53% 0.001661 0.001661 1.26 0.283 Error 12 0.015773 90.47% 0.015773 0.001314 Total 13 0.017434 100.00% S R-sq R-sq(adj) PRESS R-sq(pred) 0.0362551 9.53% 1.99% 0.0214691 0.00% Coefficients Term Coef SE Coef 95% CI T-Value P-Value VIF Constant 7.46625 0.00969 ( 7.44514, 7.48736) 770.54 0.000 ByVar1 pHH 0.01089 0.00969 (-0.01022, 0.03200) 1.12 0.283 1.00 Regression Equation Mean1 = 7.46625 + 0.01089 ByVar1_pHH - 0.01089 ByVar1_pHL Means Fitted Term Mean SE Mean ByVar1 pHH 7.4771 0.0137 pHL 7.4554 0.0137 Residual Plots for Mean1 Comments: --------- The ANOVA table gives the same F-test for strains as the full analysis above (and this should be so). Also, the standard errors of the strain least squares means are exactly the ones calculated above. The residuals in this analysis show no particular problems - neither different variation for the two strains, nor departures from normality.