Solution file for GO Problem 16.3
---------------------------------

Data: grain yields on 24 plots laid out in a design involving 3 fields
in each of two replicates (for a total of 6 fields) and within each
field two east-west strips (for a total of 12 strips), and finally
within each strip two plots. Randomization occurred at 3 levels:
  * pesticides (3) onto fields, within each replicate,
  * irrigation (2) onto strips, within each field,
  * variety (2) on to plots, within each strip.
The design therefore is a split-split-plot design, with the following
roles of units and experimental factors:
  * replicates = blocks,
  * fields = whole plots,
  * pesticide = whole-plot factor,
  * east-west strips = split plots,
  * irrigation = split-plot factor,
  * plots = split-split plots,
  * variety = split-split-plot factor.
We could also say that the data structure is hierarchical with 3 levels:
plots nested in (below) strips nested in (below) fields.

Notation:
  y_i = yield for plot i, i=1,...,24,
or
  y_ijkl = yield obtained for variety k, irrigation j and pesticide i in replicate (block) l,
           i=1,2,3; j=1,0; k=1,2, l=1,2.
  
The statistical model should include all 3*2*2=12 treatment combinations
split into main effects of the 3 factors, the 3 first-order interactions
and the second interaction, a fixed (possibly random as this has little
impact on the modelling), and 3 random terms corresponding to the 3
levels of variation: fields, strips and plots (~ error terms):

y_i = mu + alpha_pestic(i) + beta_irrig(i) + gamma_variety(i) +
alpha_beta_pestic*irrig(i) + alpha_gamma_pestic*variety(i) +
beta_gamma_irrig*variety(i) + alpha_beta_gamma_pestic*irrig*var(i) +
delta_repl(i) + u_field(i) + v_strip(i) + eps_i,
  or
y_ijkl = mu + alpha_i + beta_j + gamma_k + alpha_beta_ij +
alpha_gamma_ik + beta_gamma_jk + alpha_beta_gamma_ijk + delta_l + 
u_il + v_ijl + eps_ijkl,

depending on the chosen notation. The field random effects (u_il), strip
random effects (v_ijl) and errors (eps_ijkl) errors are assumed
independent and distributed as N(0,sigma_f^2), N(sigma_s^2) and 
N(0,sigma^2) distributions, respectively.

Minitab does not allow analysis of this model exactly. The closest we
can get is to include random effects for repl*irrig, say a w_repl*irrig(i)
or w_jl term. This term will be split from the term for strips
(pestic*irrig*repl) in the ANOVA table, and the inference will therefore
be different for the irrig and pestic*irrig effects (that are both
measured against the strip term.

MTB > WOpen "C:\DATA.AVC\teaching\vhm802\10P\ch16pr3.csv";
SUBC>   FType;
SUBC>     CSV;
SUBC>   DecSep;
SUBC>     Period;
SUBC>   Field;
SUBC>     Comma;
SUBC>   TDelimiter;
SUBC>     DoubleQuote.
Retrieving worksheet from file:
'C:\DATA.AVC\teaching\vhm802\10P\ch16pr3.csv'
Worksheet was saved on 21/03/2014

MTB > GLM 'yield' = pestic| irrig| variety repl repl*pestic repl*pestic &
CONT>     *irrig repl*irrig;
SUBC>   Random 'repl';
SUBC>   Brief 2 ;
SUBC>   EMS;
SUBC>   Means pestic irrig variety;
SUBC>   GFourpack;
SUBC>   RType 2 .
General Linear Model: yield versus pestic, irrig, variety, repl 

Factor   Type    Levels  Values
pestic   fixed        3  1, 2, 3
irrig    fixed        2  0, 1
variety  fixed        2  1, 2
repl     random       2  1, 2

Analysis of Variance for yield, using Adjusted SS for Tests

Source                DF   Seq SS   Adj SS   Adj MS      F      P
pestic                 2  256.276  256.276  128.138   3.44  0.225
irrig                  1  133.954  133.954  133.954  10.26  0.193
variety                1   44.010   44.010   44.010  29.06  0.002
pestic*irrig           2   17.748   17.748    8.874   2.57  0.280
pestic*variety         2    1.386    1.386    0.693   0.46  0.653
irrig*variety          1    2.734    2.734    2.734   1.80  0.228
pestic*irrig*variety   2    5.358    5.358    2.679   1.77  0.249
repl                   1   23.404   23.404   23.404   0.50  0.539 x
pestic*repl            2   74.403   74.403   37.201  10.76  0.085
pestic*irrig*repl      2    6.917    6.917    3.459   2.28  0.183
irrig*repl             1   13.054   13.054   13.054   3.77  0.192
Error                  6    9.087    9.087    1.515
Total                 23  588.330

x Not an exact F-test.

S = 1.23068   R-Sq = 98.46%   R-Sq(adj) = 94.08%

Expected Mean Squares, using Adjusted SS

    Source                Expected Mean Square for Each Term
 1  pestic                (12) + 2.0000 (10) + 4.0000 (9) + Q[1, 4 , 5 , 7]
 2  irrig                 (12) + 6.0000 (11) + 2.0000 (10) + Q[2, 4 , 6 , 7]
 3  variety               (12) + Q[3, 5 , 6 , 7]
 4  pestic*irrig          (12) + 2.0000 (10) + Q[4, 7]
 5  pestic*variety        (12) + Q[5, 7]
 6  irrig*variety         (12) + Q[6, 7]
 7  pestic*irrig*variety  (12) + Q[7]
 8  repl                  (12) + 6.0000 (11) + 2.0000 (10) + 4.0000 (9) + 12.0000 (8)
 9  pestic*repl           (12) + 2.0000 (10) + 4.0000 (9)
10  pestic*irrig*repl     (12) + 2.0000 (10)
11  irrig*repl            (12) + 6.0000 (11) + 2.0000 (10)
12  Error                 (12)

...

Variance Components, using Adjusted SS

                   Estimated
Source                 Value
repl                 -1.9494
pestic*repl           8.4356
pestic*irrig*repl     0.9721
irrig*repl            1.5992
Error                 1.5146


Least Squares Means for yield

pestic    Mean
1        52.88
2        59.75
3        59.86
irrig
0        59.86
1        55.13
variety
1        56.14
2        58.85
 
Residual Plots for yield 

Comments:
---------
The normal plot looks fine, despite the residual plot showing some tendency of
``inverse cone''-shape (decreasing variance with increasing values).
With only 6 DF for error, the residuals are however strongly correlated,
and we should probably not interpret this strongly. No residuals are suspiciously 
large. A Box-Cox analysis of a fixed effects model (with fixed effects
of fields and strips) does not indicate any need for transformation, as
the confidence intervals for lambda is very wide (-0.765, *).

The variance components are of quite different magnitude, with the error
variance being the smallest (representing only 1.515/12.52=12% of the
total variance (12.52) without including the repl variance). This means
that the all effects involving variety will be estimated much better
than the other effects. We postpone interpretation of the other variance
components because better estimates will be computed below.

In the ANOVA table, we focus on the effects involving variety because
these are unaffected by the added model term. All interactions with
variety are clearly non-significant, but there is a strong main effect
of variety. The lsmeans show that variety 2 gives significantly higher
yields than variety 1. Standard errors for varieties are difficult to
compute, and it is preferable to use other software. We can, however,
quite easily compute the SE for the difference between the two varieties
because it is based on the MSE (the within-strip variance):
  SE(ymean_..1.-ymean_..2.) = sqrt(MSE*2/12) = sqrt(1.515/6) = 0.50.
The denominator (12) stems from each of the two means involving an
average of 12 observations.

As part of our model checking and also in order to obtain accurate
inference for the other terms in the model, we next analyze the strip
means. Note that this is an ordinary split-plot model.

MTB > Name c6 "ByVar1" c7 "ByVar2" c8 "ByVar3" c9 "Mean1"
MTB > Statistics 'yield';
SUBC>   By 'repl' 'pestic' 'irrig';
SUBC>   GValues 'ByVar1'-'ByVar3';
SUBC>   Mean 'Mean1'.
MTB > name ByVar1 'repl1'
MTB > name ByVar2 'pestic1'
MTB > name ByVar3 'irrig1'

MTB > GLM  'Mean1' = pestic1| irrig1 repl1| pestic1;
SUBC>   Random 'repl1';
SUBC>   Brief 2 ;
SUBC>   EMS;
SUBC>   GFourpack;
SUBC>   RType 2 .
General Linear Model: Mean1 versus pestic1, irrig1, repl1 

Factor   Type    Levels  Values
pestic1  fixed        3  1, 2, 3
irrig1   fixed        2  0, 1
repl1    random       2  1, 2

Analysis of Variance for Mean1, using Adjusted SS for Tests

Source          DF   Seq SS   Adj SS  Adj MS      F      P
pestic1          2  128.138  128.138  64.069   3.44  0.225
irrig1           1   66.977   66.977  66.977  20.12  0.021
pestic1*irrig1   2    8.874    8.874   4.437   1.33  0.385
repl1            1   11.702   11.702  11.702   0.63  0.511
pestic1*repl1    2   37.201   37.201  18.601   5.59  0.097
Error            3    9.986    9.986   3.329
Total           11  262.877

S = 1.82443   R-Sq = 96.20%   R-Sq(adj) = 86.07%

Expected Mean Squares, using Adjusted SS

   Source          Expected Mean Square for Each Term
1  pestic1         (6) + 2.0000 (5) + Q[1, 3]
2  irrig1          (6) + Q[2, 3]
3  pestic1*irrig1  (6) + Q[3]
4  repl1           (6) + 2.0000 (5) + 6.0000 (4)
5  pestic1*repl1   (6) + 2.0000 (5)
6  Error           (6)

...

Variance Components, using Adjusted SS

               Estimated
Source             Value
repl1             -1.150
pestic1*repl1      7.636
Error              3.329

Least Squares Means for Mean1

pestic1   Mean
1        52.88
2        59.75
3        59.86
irrig1
0        59.86
1        55.13 

Residual Plots for Mean1 

Comments:
---------
We first comment on the residual plots. They look quite ok but also show
sign of strong correlations between residuals, not surprising with
DFE=3.

The variance components for fields and strips are now correctly
estimated at 7.636 and 3.329, respectively. This means that the field
variance accounts for 7.636/12.52=61% of the total unexplained variance,
and therefore the inference for pesticides (the whole-plot factor) is
quite weak.

The ANOVA tables gives the correct tests for all remaining terms. Among
these, only the main effect for irrigation is significant, and the
lsmeans show, somewhat surprisingly, the highest yields for the non-irrigated 
strips. It is not clear whether that was an expected result, but the 
difference between irrigated and non-irrigated strips is clearly visible 
in the raw data lising. As above, the standard errors for the irrigation 
means is difficult to compute (although here we can use the formula in 
the notes on linear mixed models), and we may prefer to use other 
statistical software to compute them. We can, however, as above compute 
the SE for the difference between the two irrigation levels because it 
is based on the MSE (the field variance) in the analysis for strip
means:
  SE(ymean_.1..-ymean_.0..) = sqrt(MSE*2/6) = sqrt(3.329/3) = 1.05.
The denominator (6) stems from each of the two means involving an
average of 6 observations (strips). It is seen that this SE is much
larger than the one compute for varieties above.

Finally, the question asks for the SE for comparison of two pesticides.
The notes on linear mixed models explain that all inference about the
whole-plot factor can be based on the whole-plot MS in the table.
Therefore,
  SE(ymean_1...-ymean_2...) = sqrt(MSrepl1*pestic1*2/4) = sqrt(18.601/2) = 3.05.
The denominator (4) stems from each of the two means involving an
average of 4 observations (strips). Again, this SE is even larger than
the two other SEs computed above.

We should also check assumptions about the field random effects. This
would involve averaging the data to fields and running a 2-way ANOVA
corresponding to the block design for pesticide treatments applied in 2
blocks, and inspecting the residuals. The number of units is so small
that it's almost impossible to detect model deviations. Indeed, the
residual plots do not reveal any obvious problems.

In summary, the analysis demonstrated significant effects of varieties
and of the use of irrigation. The variability at the different levels
implied that effects involving variety and to a lesser degree irrigation
were estimated most precisely.