Solution file for GO Problem 16.3 --------------------------------- Data: grain yields on 24 plots laid out in a design involving 3 fields in each of two replicates (for a total of 6 fields) and within each field two east-west strips (for a total of 12 strips), and finally within each strip two plots. Randomization occurred at 3 levels: * pesticides (3) onto fields, within each replicate, * irrigation (2) onto strips, within each field, * variety (2) on to plots, within each strip. The design therefore is a split-split-plot design, with the following roles of units and experimental factors: * replicates = blocks, * fields = whole plots, * pesticide = whole-plot factor, * east-west strips = split plots, * irrigation = split-plot factor, * plots = split-split plots, * variety = split-split-plot factor. We could also say that the data structure is hierarchical with 3 levels: plots nested in (below) strips nested in (below) fields. Notation: y_i = yield for plot i, i=1,...,24, or y_ijkl = yield obtained for variety k, irrigation j and pesticide i in replicate (block) l, i=1,2,3; j=1,0; k=1,2, l=1,2. The statistical model should include all 3*2*2=12 treatment combinations split into main effects of the 3 factors, the 3 first-order interactions and the second interaction, a fixed (possibly random as this has little impact on the modelling) effect of blocks, and 3 random terms corresponding to the 3 levels of variation: fields, strips and plots (~ error terms): y_i = mu + alpha_pestic(i) + beta_irrig(i) + gamma_variety(i) + alpha_beta_pestic*irrig(i) + alpha_gamma_pestic*variety(i) + beta_gamma_irrig*variety(i) + alpha_beta_gamma_pestic*irrig*var(i) + delta_repl(i) + u_field(i) + v_strip(i) + eps_i, or y_ijkl = mu + alpha_i + beta_j + gamma_k + alpha_beta_ij + alpha_gamma_ik + beta_gamma_jk + alpha_beta_gamma_ijk + delta_l + u_il + v_ijl + eps_ijkl, depending on the chosen notation. The field random effects (u_il), strip random effects (v_ijl) and errors (eps_ijkl) errors are assumed independent and distributed as N(0,sigma_f^2), N(sigma_s^2) and N(0,sigma^2) distributions, respectively. Minitab does not allow analysis of this model exactly. The closest we can get is to include random effects for repl*irrig, say a w_repl*irrig(i) or w_jl term. This term will be split from the term for strips (pestic*irrig*repl) in the ANOVA table, and the inference will therefore be different for the irrig and pestic*irrig effects (that are both measured against the strip term. MTB > WOpen "H:\VHM\VHM802\Data_csv\ch16pr3.csv"; SUBC> FType; SUBC> CSV; SUBC> DecSep; SUBC> Period; SUBC> Field; SUBC> Comma; SUBC> TDelimiter; SUBC> DoubleQuote. Retrieving worksheet from file: ‘H:\VHM\VHM802\Data_csv\ch16pr3.csv’ Worksheet was saved on 21/03/2014 MTB > GLM; SUBC> Response 'yield'; SUBC> Nodefault; SUBC> Categorical 'pestic' 'irrig' 'variety' 'repl'; SUBC> Random repl; SUBC> Terms pestic irrig variety repl pestic*irrig pestic*variety pestic*repl & CONT> irrig*variety irrig*repl pestic*irrig*variety pestic*irrig*repl; SUBC> Means pestic irrig variety repl; SUBC> TExpand; SUBC> TMethod; SUBC> TAnova; SUBC> TSummary; SUBC> TCoefficients; SUBC> TEquation; SUBC> TFactor; SUBC> TEMS; SUBC> TVariance; SUBC> TMeans; SUBC> TDiagnostics 0; SUBC> Rtype 2; SUBC> GFOURPACK. General Linear Model: yield versus pestic, irrig, variety, repl Method Factor coding (-1, 0, +1) Factor Information Factor Type Levels Values pestic Fixed 3 1, 2, 3 irrig Fixed 2 0, 1 variety Fixed 2 1, 2 repl Random 2 1, 2 Analysis of Variance Source DF Seq SS Contribution Adj SS Adj MS F-Value P-Value pestic 2 256.276 43.56% 256.276 128.138 3.44 0.225 irrig 1 133.954 22.77% 133.954 133.954 10.26 0.193 variety 1 44.010 7.48% 44.010 44.010 29.06 0.002 repl 1 23.404 3.98% 23.404 23.404 0.50 0.539 x pestic*irrig 2 17.748 3.02% 17.748 8.874 2.57 0.280 pestic*variety 2 1.386 0.24% 1.386 0.693 0.46 0.653 pestic*repl 2 74.403 12.65% 74.403 37.201 10.76 0.085 irrig*variety 1 2.734 0.46% 2.734 2.734 1.80 0.228 irrig*repl 1 13.054 2.22% 13.054 13.054 3.77 0.192 pestic*irrig*variety 2 5.358 0.91% 5.358 2.679 1.77 0.249 pestic*irrig*repl 2 6.917 1.18% 6.917 3.459 2.28 0.183 Error 6 9.088 1.54% 9.088 1.515 Total 23 588.330 100.00% x Not an exact F-test. Model Summary S R-sq R-sq(adj) PRESS R-sq(pred) 1.23068 98.46% 94.08% 145.4 75.29% Coefficients Term Coef SE Coef 95% CI T-Value P-Value VIF Constant 57.496 0.251 (56.881, 58.111) 228.87 0.000 pestic 1 -4.621 0.355 (-5.490, -3.752) -13.01 0.000 1.33 2 2.254 0.355 ( 1.385, 3.123) 6.34 0.001 1.33 irrig 0 2.363 0.251 ( 1.748, 2.977) 9.40 0.000 1.00 variety 1 -1.354 0.251 (-1.969, -0.739) -5.39 0.002 1.00 repl 1 0.987 0.251 ( 0.373, 1.602) 3.93 0.008 * pestic*irrig 1 0 -0.688 0.355 (-1.557, 0.182) -1.94 0.101 1.33 2 0 1.213 0.355 ( 0.343, 2.082) 3.41 0.014 1.33 pestic*variety 1 1 0.304 0.355 (-0.565, 1.173) 0.86 0.425 1.33 2 1 -0.021 0.355 (-0.890, 0.848) -0.06 0.955 1.33 pestic*repl 1 1 2.363 0.355 ( 1.493, 3.232) 6.65 0.001 * 2 1 -1.863 0.355 (-2.732, -0.993) -5.24 0.002 * irrig*variety 0 1 -0.337 0.251 (-0.952, 0.277) -1.34 0.228 1.00 irrig*repl 0 1 0.738 0.251 ( 0.123, 1.352) 2.94 0.026 * pestic*irrig*variety 1 0 1 0.538 0.355 (-0.332, 1.407) 1.51 0.181 1.33 2 0 1 -0.613 0.355 (-1.482, 0.257) -1.72 0.135 1.33 pestic*irrig*repl 1 0 1 0.213 0.355 (-0.657, 1.082) 0.60 0.572 * 2 0 1 -0.737 0.355 (-1.607, 0.132) -2.08 0.083 * Regression Equation ... Expected Mean Squares, using Adjusted SS Source Expected Mean Square for Each Term 1 pestic (12) + 2.0000 (11) + 4.0000 (7) + Q[1, 5, 6, 10] 2 irrig (12) + 2.0000 (11) + 6.0000 (9) + Q[2, 5, 8, 10] 3 variety (12) + Q[3, 6, 8, 10] 4 repl (12) + 2.0000 (11) + 6.0000 (9) + 4.0000 (7) + 12.0000 (4) 5 pestic*irrig (12) + 2.0000 (11) + Q[5, 10] 6 pestic*variety (12) + Q[6, 10] 7 pestic*repl (12) + 2.0000 (11) + 4.0000 (7) 8 irrig*variety (12) + Q[8, 10] 9 irrig*repl (12) + 2.0000 (11) + 6.0000 (9) 10 pestic*irrig*variety (12) + Q[10] 11 pestic*irrig*repl (12) + 2.0000 (11) 12 Error (12) Means Term Fitted Mean pestic 1 52.8750 2 59.7500 3 59.8625 irrig 0 59.8583 1 55.1333 variety 1 56.1417 2 58.8500 repl 1 58.4833 2 56.5083 Variance Components, using Adjusted SS Source Variance % of Total StDev % of Total repl -1.94938* 0.00% 0.00000 0.00% pestic*repl 8.43563 67.37% 2.90441 82.08% irrig*repl 1.59917 12.77% 1.26458 35.74% pestic*irrig*repl 0.972083 7.76% 0.98594 27.86% Error 1.51458 12.10% 1.23068 34.78% Total 12.5215 3.53857 * Value is negative, and is estimated by zero. Residual Plots for yield Comments: --------- The normal plot looks fine, despite the residual plot showing some tendency of ``inverse cone''-shape (decreasing variance with increasing values). With only 6 DF for error, the residuals are however strongly correlated, and we should probably not interpret this strongly. No residuals are suspiciously large. A Box-Cox analysis of a fixed effects model (with fixed effect of repl, and hence of fields and strips) gives evidence of a need for transformation with lambda>1, see below. The residual plot after power transformation with lambda=5 no longer has the same shape, but the normal plot is less straight. In a practical situation one would need to consider the advantages and drawbacks of such a transformation, and in that case one should really carry out the Box-Cox analysis for a random effects model (details in VER Chapter 21). Here we confine ourselves to the model on original scale. MTB > GLM; SUBC> Response 'yield'; SUBC> Nodefault; SUBC> Categorical 'repl' 'pestic' 'irrig' 'variety'; SUBC> Terms repl pestic irrig variety repl*pestic repl*irrig pestic*irrig & CONT> pestic*variety irrig*variety repl*pestic*irrig pestic*irrig*variety; SUBC> Boxcox; SUBC> TMethod; SUBC> TAnova; SUBC> TSummary; SUBC> TCoefficients; SUBC> TEquation; SUBC> TFactor; SUBC> TDiagnostics 0; SUBC> Rtype 2; SUBC> GFOURPACK. General Linear Model: yield versus repl, pestic, irrig, variety Box-Cox transformation Rounded lambda 5 Estimated lambda 4.62399 95% CI for lambda (2.03149, 7.27549) As noted above, we continue our discussion of the results on original scale. The variance components are of quite different magnitude, with the error variance being the smallest (representing only 1.515/12.52=12% of the total variance (12.52) without including the repl variance). This means that all the effects involving variety will be estimated much better than the other effects. We postpone interpretation of the other variance components because better estimates will be computed below. In the ANOVA table, we focus on the effects involving variety because these are unaffected by the added model term. All interactions with variety are clearly non-significant, but there is a strong main effect of variety. The lsmeans show that variety 2 gives significantly higher yields than variety 1. Standard errors for varieties are difficult to compute, and it is preferable to use other software. We can, however, quite easily compute the SE for the difference between the two varieties because it is based on the MSE (the within-strip variance): SE(ymean_..1.-ymean_..2.) = sqrt(MSE*2/12) = sqrt(1.515/6) = 0.50. The denominator (12) comes from each of the two means involving an average of 12 observations. As part of our model checking and also in order to obtain accurate inference for the other terms in the model, we next analyze the strip means. Note that this is an ordinary split-plot model. MTB > Name c6 "ByVar1" c7 "ByVar2" c8 "ByVar3" c9 "Mean1" MTB > Statistics 'yield'; SUBC> By 'repl' 'pestic' 'irrig'; SUBC> GValues 'ByVar1'-'ByVar3'; SUBC> Mean 'Mean1'. MTB > name ByVar1 'repl1' MTB > name ByVar2 'pestic1' MTB > name ByVar3 'irrig1' MTB > GLM; SUBC> Response 'Mean1'; SUBC> Nodefault; SUBC> Categorical 'repl1' 'pestic1' 'irrig1'; SUBC> Random repl1; SUBC> Terms repl1 pestic1 irrig1 repl1*pestic1 pestic1*irrig1; SUBC> Means pestic1 irrig1; SUBC> TExpand; SUBC> TMethod; SUBC> TAnova; SUBC> TSummary; SUBC> TCoefficients; SUBC> TEquation; SUBC> TFactor; SUBC> TEMS; SUBC> TVariance; SUBC> TMeans; SUBC> TDiagnostics 0; SUBC> Rtype 2; SUBC> GFOURPACK. General Linear Model: Mean1 versus repl1, pestic1, irrig1 Factor Information Factor Type Levels Values repl1 Random 2 1, 2 pestic1 Fixed 3 1, 2, 3 irrig1 Fixed 2 0, 1 Analysis of Variance Source DF Seq SS Contribution Adj SS Adj MS F-Value P-Value repl1 1 11.702 4.45% 11.702 11.702 0.63 0.511 pestic1 2 128.138 48.74% 128.138 64.069 3.44 0.225 irrig1 1 66.977 25.48% 66.977 66.977 20.12 0.021 repl1*pestic1 2 37.201 14.15% 37.201 18.601 5.59 0.097 pestic1*irrig1 2 8.874 3.38% 8.874 4.437 1.33 0.385 Error 3 9.986 3.80% 9.986 3.329 Total 11 262.877 100.00% S R-sq R-sq(adj) PRESS R-sq(pred) 1.82443 96.20% 86.07% 159.77 39.22% Coefficients Term Coef SE Coef 95% CI T-Value P-Value VIF Constant 57.496 0.527 (55.820, 59.172) 109.17 0.000 repl1 1 0.988 0.527 (-0.689, 2.664) 1.87 0.157 * pestic1 1 -4.621 0.745 (-6.991, -2.250) -6.20 0.008 1.33 2 2.254 0.745 (-0.116, 4.625) 3.03 0.056 1.33 irrig1 0 2.363 0.527 ( 0.686, 4.039) 4.49 0.021 1.00 repl1*pestic1 1 1 2.362 0.745 (-0.008, 4.733) 3.17 0.050 * 1 2 -1.863 0.745 (-4.233, 0.508) -2.50 0.088 * pestic1*irrig1 1 0 -0.687 0.745 (-3.058, 1.683) -0.92 0.424 1.33 2 0 1.213 0.745 (-1.158, 3.583) 1.63 0.202 1.33 Regression Equation Mean1 = 57.496 + 0.988 repl1_1 - 0.988 repl1_2 - 4.621 pestic1_1 + 2.254 pestic1_2 + 2.367 pestic1_3 + 2.363 irrig1_0 - 2.363 irrig1_1 + 2.362 repl1*pestic1_1 1 - 1.863 repl1*pestic1_1 2 - 0.500 repl1*pestic1_1 3 - 2.362 repl1*pestic1_2 1 + 1.863 repl1*pestic1_2 2 + 0.500 repl1*pestic1_2 3 - 0.687 pestic1*irrig1_1 0 + 0.687 pestic1*irrig1_1 1 + 1.213 pestic1*irrig1_2 0 - 1.213 pestic1*irrig1_2 1 - 0.525 pestic1*irrig1_3 0 + 0.525 pestic1*irrig1_3 1 Equation treats random terms as though they are fixed. Expected Mean Squares, using Adjusted SS Source Expected Mean Square for Each Term 1 repl1 (6) + 2.0000 (4) + 6.0000 (1) 2 pestic1 (6) + 2.0000 (4) + Q[2, 5] 3 irrig1 (6) + Q[3, 5] 4 repl1*pestic1 (6) + 2.0000 (4) 5 pestic1*irrig1 (6) + Q[5] 6 Error (6) Means Term Fitted Mean pestic1 1 52.8750 2 59.7500 3 59.8625 irrig1 0 59.8583 1 55.1333 Variance Components, using Adjusted SS Source Variance % of Total StDev % of Total repl1 -1.14979* 0.00% 0.00000 0.00% repl1*pestic1 7.63604 69.64% 2.76334 83.45% Error 3.32854 30.36% 1.82443 55.10% Total 10.9646 3.31128 * Value is negative, and is estimated by zero. Residual Plots for Mean1 Comments: --------- We first comment on the residual plots. They look quite ok but also show sign of strong correlations between residuals, not surprising with DFE=3. The variance components for fields is now correctly estimated at 7.636. The error variance (3.329) can be used to recover the original variance component for strips (because the formulae for variances of sums and means of variance shows it to be equal to sigma_f^2+(sigma^2)/2), but we won't explore this technical detail. Realistically, if one wanted to get the correct variance component estimates, one would run the analysis in another analysis than Minitab. In any case, we see the field variance component is clearly the largest, and therefore the inference for pesticides (the whole-plot factor) is quite weak. The ANOVA tables gives the correct tests for all remaining terms. Among these, only the main effect for irrigation is significant, and the lsmeans show, somewhat surprisingly, the highest yields for the non-irrigated strips. It is not clear whether that was an expected result, but the difference between irrigated and non-irrigated strips is clearly visible in the raw data lising. As above, the standard errors for the irrigation means is difficult to compute (although here we can use the formula in the notes on linear mixed models), and we may prefer to use other statistical software to compute them. We can, however, as above compute the SE for the difference between the two irrigation levels because it is based on the MSE (the field variance) in the analysis for strip means: SE(ymean_.1..-ymean_.0..) = sqrt(MSE*2/6) = sqrt(3.329/3) = 1.05. The denominator (6) comes from each of the two means involving an average of 6 observations (strips). It is seen that this SE is much larger than the one compute for varieties above. Finally, the question asks for the SE for comparison of two pesticides. The notes on linear mixed models explain that all inference about the whole-plot factor can be based on the whole-plot MS in the table. Therefore, SE(ymean_1...-ymean_2...) = sqrt(MSrepl1*pestic1*2/4) = sqrt(18.601/2) = 3.05. The denominator (4) comes from each of the two means involving an average of 4 observations (strips). Again, this SE is even larger than the two other SEs computed above. Alternatively, we could use the analysis for the field means. We should also check assumptions about the field random effects. This would involve averaging the data to fields and running a 2-way ANOVA corresponding to the block design for pesticide treatments applied in 2 blocks, and inspecting the residuals. The number of units is so small that it's almost impossible to detect model deviations. Indeed, the residual plots do not reveal any obvious problems. In summary, the analysis demonstrated significant effects of varieties and of the use of irrigation. The variability at the different levels implied that effects involving variety and to a lesser degree irrigation were estimated most precisely.