Solution file for GO Problem 16.3 --------------------------------- Data: grain yields on 24 plots laid out in a design involving 3 fields in each of two replicates (for a total of 6 fields) and within each field two east-west strips (for a total of 12 strips), and finally within each strip two plots. Randomization occurred at 3 levels: * pesticides (3) onto fields, within each replicate, * irrigation (2) onto strips, within each field, * variety (2) on to plots, within each strip. The design therefore is a split-split-plot design, with the following roles of units and experimental factors: * replicates = blocks, * fields = whole plots, * pesticide = whole-plot factor, * east-west strips = split plots, * irrigation = split-plot factor, * plots = split-split plots, * variety = split-split-plot factor. We could also say that the data structure is hierarchical with 3 levels: plots nested in (below) strips nested in (below) fields. Notation: y_i = yield for plot i, i=1,...,24, or y_ijkl = yield obtained for variety k, irrigation j and pesticide i in replicate (block) l, i=1,2,3; j=1,0; k=1,2, l=1,2. The statistical model should include all 3*2*2=12 treatment combinations split into main effects of the 3 factors, the 3 first-order interactions and the second interaction, a fixed (possibly random as this has little impact on the modelling), and 3 random terms corresponding to the 3 levels of variation: fields, strips and plots (~ error terms): y_i = mu + alpha_pestic(i) + beta_irrig(i) + gamma_variety(i) + alpha_beta_pestic*irrig(i) + alpha_gamma_pestic*variety(i) + beta_gamma_irrig*variety(i) + alpha_beta_gamma_pestic*irrig*var(i) + delta_repl(i) + u_field(i) + v_strip(i) + eps_i, or y_ijkl = mu + alpha_i + beta_j + gamma_k + alpha_beta_ij + alpha_gamma_ik + beta_gamma_jk + alpha_beta_gamma_ijk + delta_l + u_il + v_ijl + eps_ijkl, depending on the chosen notation. The field random effects (u_il), strip random effects (v_ijl) and errors (eps_ijkl) errors are assumed independent and distributed as N(0,sigma_f^2), N(sigma_s^2) and N(0,sigma^2) distributions, respectively. Minitab does not allow analysis of this model exactly. The closest we can get is to include random effects for repl*irrig, say a w_repl*irrig(i) or w_jl term. This term will be split from the term for strips (pestic*irrig*repl) in the ANOVA table, and the inference will therefore be different for the irrig and pestic*irrig effects (that are both measured against the strip term. MTB > WOpen "C:\DATA.AVC\teaching\vhm802\10P\ch16pr3.csv"; SUBC> FType; SUBC> CSV; SUBC> DecSep; SUBC> Period; SUBC> Field; SUBC> Comma; SUBC> TDelimiter; SUBC> DoubleQuote. Retrieving worksheet from file: 'C:\DATA.AVC\teaching\vhm802\10P\ch16pr3.csv' Worksheet was saved on 21/03/2014 MTB > GLM 'yield' = pestic| irrig| variety repl repl*pestic repl*pestic & CONT> *irrig repl*irrig; SUBC> Random 'repl'; SUBC> Brief 2 ; SUBC> EMS; SUBC> Means pestic irrig variety; SUBC> GFourpack; SUBC> RType 2 . General Linear Model: yield versus pestic, irrig, variety, repl Factor Type Levels Values pestic fixed 3 1, 2, 3 irrig fixed 2 0, 1 variety fixed 2 1, 2 repl random 2 1, 2 Analysis of Variance for yield, using Adjusted SS for Tests Source DF Seq SS Adj SS Adj MS F P pestic 2 256.276 256.276 128.138 3.44 0.225 irrig 1 133.954 133.954 133.954 10.26 0.193 variety 1 44.010 44.010 44.010 29.06 0.002 pestic*irrig 2 17.748 17.748 8.874 2.57 0.280 pestic*variety 2 1.386 1.386 0.693 0.46 0.653 irrig*variety 1 2.734 2.734 2.734 1.80 0.228 pestic*irrig*variety 2 5.358 5.358 2.679 1.77 0.249 repl 1 23.404 23.404 23.404 0.50 0.539 x pestic*repl 2 74.403 74.403 37.201 10.76 0.085 pestic*irrig*repl 2 6.917 6.917 3.459 2.28 0.183 irrig*repl 1 13.054 13.054 13.054 3.77 0.192 Error 6 9.087 9.087 1.515 Total 23 588.330 x Not an exact F-test. S = 1.23068 R-Sq = 98.46% R-Sq(adj) = 94.08% Expected Mean Squares, using Adjusted SS Source Expected Mean Square for Each Term 1 pestic (12) + 2.0000 (10) + 4.0000 (9) + Q[1, 4 , 5 , 7] 2 irrig (12) + 6.0000 (11) + 2.0000 (10) + Q[2, 4 , 6 , 7] 3 variety (12) + Q[3, 5 , 6 , 7] 4 pestic*irrig (12) + 2.0000 (10) + Q[4, 7] 5 pestic*variety (12) + Q[5, 7] 6 irrig*variety (12) + Q[6, 7] 7 pestic*irrig*variety (12) + Q[7] 8 repl (12) + 6.0000 (11) + 2.0000 (10) + 4.0000 (9) + 12.0000 (8) 9 pestic*repl (12) + 2.0000 (10) + 4.0000 (9) 10 pestic*irrig*repl (12) + 2.0000 (10) 11 irrig*repl (12) + 6.0000 (11) + 2.0000 (10) 12 Error (12) ... Variance Components, using Adjusted SS Estimated Source Value repl -1.9494 pestic*repl 8.4356 pestic*irrig*repl 0.9721 irrig*repl 1.5992 Error 1.5146 Least Squares Means for yield pestic Mean 1 52.88 2 59.75 3 59.86 irrig 0 59.86 1 55.13 variety 1 56.14 2 58.85 Residual Plots for yield Comments: --------- The normal plot looks fine, despite the residual plot showing some tendency of ``inverse cone''-shape (decreasing variance with increasing values). With only 6 DF for error, the residuals are however strongly correlated, and we should probably not interpret this strongly. No residuals are suspiciously large. A Box-Cox analysis of a fixed effects model (with fixed effects of fields and strips) does not indicate any need for transformation, as the confidence intervals for lambda is very wide (-0.765, *). The variance components are of quite different magnitude, with the error variance being the smallest (representing only 1.515/12.52=12% of the total variance (12.52) without including the repl variance). This means that the all effects involving variety will be estimated much better than the other effects. We postpone interpretation of the other variance components because better estimates will be computed below. In the ANOVA table, we focus on the effects involving variety because these are unaffected by the added model term. All interactions with variety are clearly non-significant, but there is a strong main effect of variety. The lsmeans show that variety 2 gives significantly higher yields than variety 1. Standard errors for varieties are difficult to compute, and it is preferable to use other software. We can, however, quite easily compute the SE for the difference between the two varieties because it is based on the MSE (the within-strip variance): SE(ymean_..1.-ymean_..2.) = sqrt(MSE*2/12) = sqrt(1.515/6) = 0.50. The denominator (12) stems from each of the two means involving an average of 12 observations. As part of our model checking and also in order to obtain accurate inference for the other terms in the model, we next analyze the strip means. Note that this is an ordinary split-plot model. MTB > Name c6 "ByVar1" c7 "ByVar2" c8 "ByVar3" c9 "Mean1" MTB > Statistics 'yield'; SUBC> By 'repl' 'pestic' 'irrig'; SUBC> GValues 'ByVar1'-'ByVar3'; SUBC> Mean 'Mean1'. MTB > name ByVar1 'repl1' MTB > name ByVar2 'pestic1' MTB > name ByVar3 'irrig1' MTB > GLM 'Mean1' = pestic1| irrig1 repl1| pestic1; SUBC> Random 'repl1'; SUBC> Brief 2 ; SUBC> EMS; SUBC> GFourpack; SUBC> RType 2 . General Linear Model: Mean1 versus pestic1, irrig1, repl1 Factor Type Levels Values pestic1 fixed 3 1, 2, 3 irrig1 fixed 2 0, 1 repl1 random 2 1, 2 Analysis of Variance for Mean1, using Adjusted SS for Tests Source DF Seq SS Adj SS Adj MS F P pestic1 2 128.138 128.138 64.069 3.44 0.225 irrig1 1 66.977 66.977 66.977 20.12 0.021 pestic1*irrig1 2 8.874 8.874 4.437 1.33 0.385 repl1 1 11.702 11.702 11.702 0.63 0.511 pestic1*repl1 2 37.201 37.201 18.601 5.59 0.097 Error 3 9.986 9.986 3.329 Total 11 262.877 S = 1.82443 R-Sq = 96.20% R-Sq(adj) = 86.07% Expected Mean Squares, using Adjusted SS Source Expected Mean Square for Each Term 1 pestic1 (6) + 2.0000 (5) + Q[1, 3] 2 irrig1 (6) + Q[2, 3] 3 pestic1*irrig1 (6) + Q[3] 4 repl1 (6) + 2.0000 (5) + 6.0000 (4) 5 pestic1*repl1 (6) + 2.0000 (5) 6 Error (6) ... Variance Components, using Adjusted SS Estimated Source Value repl1 -1.150 pestic1*repl1 7.636 Error 3.329 Least Squares Means for Mean1 pestic1 Mean 1 52.88 2 59.75 3 59.86 irrig1 0 59.86 1 55.13 Residual Plots for Mean1 Comments: --------- We first comment on the residual plots. They look quite ok but also show sign of strong correlations between residuals, not surprising with DFE=3. The variance components for fields and strips are now correctly estimated at 7.636 and 3.329, respectively. This means that the field variance accounts for 7.636/12.52=61% of the total unexplained variance, and therefore the inference for pesticides (the whole-plot factor) is quite weak. The ANOVA tables gives the correct tests for all remaining terms. Among these, only the main effect for irrigation is significant, and the lsmeans show, somewhat surprisingly, the highest yields for the non-irrigated strips. It is not clear whether that was an expected result, but the difference between irrigated and non-irrigated strips is clearly visible in the raw data lising. As above, the standard errors for the irrigation means is difficult to compute (although here we can use the formula in the notes on linear mixed models), and we may prefer to use other statistical software to compute them. We can, however, as above compute the SE for the difference between the two irrigation levels because it is based on the MSE (the field variance) in the analysis for strip means: SE(ymean_.1..-ymean_.0..) = sqrt(MSE*2/6) = sqrt(3.329/3) = 1.05. The denominator (6) stems from each of the two means involving an average of 6 observations (strips). It is seen that this SE is much larger than the one compute for varieties above. Finally, the question asks for the SE for comparison of two pesticides. The notes on linear mixed models explain that all inference about the whole-plot factor can be based on the whole-plot MS in the table. Therefore, SE(ymean_1...-ymean_2...) = sqrt(MSrepl1*pestic1*2/4) = sqrt(18.601/2) = 3.05. The denominator (4) stems from each of the two means involving an average of 4 observations (strips). Again, this SE is even larger than the two other SEs computed above. We should also check assumptions about the field random effects. This would involve averaging the data to fields and running a 2-way ANOVA corresponding to the block design for pesticide treatments applied in 2 blocks, and inspecting the residuals. The number of units is so small that it's almost impossible to detect model deviations. Indeed, the residual plots do not reveal any obvious problems. In summary, the analysis demonstrated significant effects of varieties and of the use of irrigation. The variability at the different levels implied that effects involving variety and to a lesser degree irrigation were estimated most precisely.