Solution file for GO Problem 16.3
---------------------------------

Data: grain yields on 24 plots laid out in a design involving 3 fields
in each of two replicates (for a total of 6 fields) and within each
field two east-west strips (for a total of 12 strips), and finally
within each strip two plots. Randomization occurred at 3 levels:
  * pesticides (3) onto fields, within each replicate,
  * irrigation (2) onto strips, within each field,
  * variety (2) on to plots, within each strip.
The design therefore is a split-split-plot design, with the following
roles of units and experimental factors:
  * replicates = blocks,
  * fields = whole plots,
  * pesticide = whole-plot factor,
  * east-west strips = split plots,
  * irrigation = split-plot factor,
  * plots = split-split plots,
  * variety = split-split-plot factor.
We could also say that the data structure is hierarchical with 3 levels:
plots nested in (below) strips nested in (below) fields.

Notation:
  y_i = yield for plot i, i=1,...,24,
or
  y_ijkl = yield obtained for variety k, irrigation j and pesticide i in replicate (block) l,
           i=1,2,3; j=1,0; k=1,2, l=1,2.
  
The statistical model should include all 3*2*2=12 treatment combinations
split into main effects of the 3 factors, the 3 first-order interactions
and the second interaction, a fixed (possibly random as this has little
impact on the modelling) effect of blocks, and 3 random terms corresponding 
to the 3 levels of variation: fields, strips and plots (~ error terms):

y_i = mu + alpha_pestic(i) + beta_irrig(i) + gamma_variety(i) +
alpha_beta_pestic*irrig(i) + alpha_gamma_pestic*variety(i) +
beta_gamma_irrig*variety(i) + alpha_beta_gamma_pestic*irrig*var(i) +
delta_repl(i) + u_field(i) + v_strip(i) + eps_i,
  or
y_ijkl = mu + alpha_i + beta_j + gamma_k + alpha_beta_ij +
alpha_gamma_ik + beta_gamma_jk + alpha_beta_gamma_ijk + delta_l + 
u_il + v_ijl + eps_ijkl,

depending on the chosen notation. The field random effects (u_il), strip
random effects (v_ijl) and errors (eps_ijkl) errors are assumed
independent and distributed as N(0,sigma_f^2), N(sigma_s^2) and 
N(0,sigma^2) distributions, respectively.

Minitab does not allow analysis of this model exactly. The closest we
can get is to include random effects for repl*irrig, say a w_repl*irrig(i)
or w_jl term. This term will be split from the term for strips
(pestic*irrig*repl) in the ANOVA table, and the inference will therefore
be different for the irrig and pestic*irrig effects (that are both
measured against the strip term.

MTB > WOpen "H:\VHM\VHM802\Data_csv\ch16pr3.csv";
SUBC>   FType;
SUBC>     CSV;
SUBC>   DecSep;
SUBC>     Period;
SUBC>   Field;
SUBC>     Comma;
SUBC>   TDelimiter;
SUBC>     DoubleQuote.
Retrieving worksheet from file: ‘H:\VHM\VHM802\Data_csv\ch16pr3.csv’
Worksheet was saved on 21/03/2014

MTB > GLM;
SUBC>   Response 'yield';
SUBC>   Nodefault;
SUBC>   Categorical 'pestic' 'irrig' 'variety' 'repl';
SUBC>   Random repl;
SUBC>   Terms pestic irrig variety repl pestic*irrig pestic*variety pestic*repl &
CONT>       irrig*variety irrig*repl pestic*irrig*variety pestic*irrig*repl;
SUBC>   Means pestic irrig variety repl;
SUBC>   TExpand;
SUBC>   TMethod;
SUBC>   TAnova;
SUBC>   TSummary;
SUBC>   TCoefficients;
SUBC>   TEquation;
SUBC>   TFactor;
SUBC>   TEMS;
SUBC>   TVariance;
SUBC>   TMeans;
SUBC>   TDiagnostics 0;
SUBC>   Rtype 2;
SUBC>  GFOURPACK.
General Linear Model: yield versus pestic, irrig, variety, repl 

Method
Factor coding  (-1, 0, +1)

Factor Information
Factor   Type    Levels  Values
pestic   Fixed        3  1, 2, 3
irrig    Fixed        2  0, 1
variety  Fixed        2  1, 2
repl     Random       2  1, 2

Analysis of Variance
Source                  DF   Seq SS  Contribution   Adj SS   Adj MS  F-Value  P-Value
  pestic                 2  256.276        43.56%  256.276  128.138     3.44    0.225
  irrig                  1  133.954        22.77%  133.954  133.954    10.26    0.193
  variety                1   44.010         7.48%   44.010   44.010    29.06    0.002
  repl                   1   23.404         3.98%   23.404   23.404     0.50    0.539  x
  pestic*irrig           2   17.748         3.02%   17.748    8.874     2.57    0.280
  pestic*variety         2    1.386         0.24%    1.386    0.693     0.46    0.653
  pestic*repl            2   74.403        12.65%   74.403   37.201    10.76    0.085
  irrig*variety          1    2.734         0.46%    2.734    2.734     1.80    0.228
  irrig*repl             1   13.054         2.22%   13.054   13.054     3.77    0.192
  pestic*irrig*variety   2    5.358         0.91%    5.358    2.679     1.77    0.249
  pestic*irrig*repl      2    6.917         1.18%    6.917    3.459     2.28    0.183
Error                    6    9.088         1.54%    9.088    1.515
Total                   23  588.330       100.00%
x Not an exact F-test.

Model Summary
      S    R-sq  R-sq(adj)  PRESS  R-sq(pred)
1.23068  98.46%     94.08%  145.4      75.29%

Coefficients
Term                    Coef  SE Coef       95% CI       T-Value  P-Value   VIF
Constant              57.496    0.251  (56.881, 58.111)   228.87    0.000
pestic
  1                   -4.621    0.355  (-5.490, -3.752)   -13.01    0.000  1.33
  2                    2.254    0.355  ( 1.385,  3.123)     6.34    0.001  1.33
irrig
  0                    2.363    0.251  ( 1.748,  2.977)     9.40    0.000  1.00
variety
  1                   -1.354    0.251  (-1.969, -0.739)    -5.39    0.002  1.00
repl
  1                    0.987    0.251  ( 0.373,  1.602)     3.93    0.008     *
pestic*irrig
  1 0                 -0.688    0.355  (-1.557,  0.182)    -1.94    0.101  1.33
  2 0                  1.213    0.355  ( 0.343,  2.082)     3.41    0.014  1.33
pestic*variety
  1 1                  0.304    0.355  (-0.565,  1.173)     0.86    0.425  1.33
  2 1                 -0.021    0.355  (-0.890,  0.848)    -0.06    0.955  1.33
pestic*repl
  1 1                  2.363    0.355  ( 1.493,  3.232)     6.65    0.001     *
  2 1                 -1.863    0.355  (-2.732, -0.993)    -5.24    0.002     *
irrig*variety
  0 1                 -0.337    0.251  (-0.952,  0.277)    -1.34    0.228  1.00
irrig*repl
  0 1                  0.738    0.251  ( 0.123,  1.352)     2.94    0.026     *
pestic*irrig*variety
  1 0 1                0.538    0.355  (-0.332,  1.407)     1.51    0.181  1.33
  2 0 1               -0.613    0.355  (-1.482,  0.257)    -1.72    0.135  1.33
pestic*irrig*repl
  1 0 1                0.213    0.355  (-0.657,  1.082)     0.60    0.572     *
  2 0 1               -0.737    0.355  (-1.607,  0.132)    -2.08    0.083     *

Regression Equation
...

Expected Mean Squares, using Adjusted SS
    Source                Expected Mean Square for Each Term
 1  pestic                (12) + 2.0000 (11) + 4.0000 (7) + Q[1, 5, 6, 10]
 2  irrig                 (12) + 2.0000 (11) + 6.0000 (9) + Q[2, 5, 8, 10]
 3  variety               (12) + Q[3, 6, 8, 10]
 4  repl                  (12) + 2.0000 (11) + 6.0000 (9) + 4.0000 (7) + 12.0000 (4)
 5  pestic*irrig          (12) + 2.0000 (11) + Q[5, 10]
 6  pestic*variety        (12) + Q[6, 10]
 7  pestic*repl           (12) + 2.0000 (11) + 4.0000 (7)
 8  irrig*variety         (12) + Q[8, 10]
 9  irrig*repl            (12) + 2.0000 (11) + 6.0000 (9)
10  pestic*irrig*variety  (12) + Q[10]
11  pestic*irrig*repl     (12) + 2.0000 (11)
12  Error                 (12)

Means
Term     Fitted Mean
pestic
  1          52.8750
  2          59.7500
  3          59.8625
irrig
  0          59.8583
  1          55.1333
variety
  1          56.1417
  2          58.8500
repl
  1          58.4833
  2          56.5083

Variance Components, using Adjusted SS
Source              Variance  % of Total    StDev  % of Total
repl               -1.94938*       0.00%  0.00000       0.00%
pestic*repl         8.43563       67.37%  2.90441      82.08%
irrig*repl          1.59917       12.77%  1.26458      35.74%
pestic*irrig*repl  0.972083        7.76%  0.98594      27.86%
Error               1.51458       12.10%  1.23068      34.78%
Total               12.5215               3.53857
* Value is negative, and is estimated by zero.
 
Residual Plots for yield 

Comments:
---------
The normal plot looks fine, despite the residual plot showing some tendency of
``inverse cone''-shape (decreasing variance with increasing values).
With only 6 DF for error, the residuals are however strongly correlated,
and we should probably not interpret this strongly. No residuals are suspiciously 
large. A Box-Cox analysis of a fixed effects model (with fixed effect
of repl, and hence of fields and strips) gives evidence of a need for 
transformation with lambda>1, see below. The residual plot after power
transformation with lambda=5 no longer has the same shape, but the
normal plot is less straight. In a practical situation one would need to
consider the advantages and drawbacks of such a transformation, and in
that case one should really carry out the Box-Cox analysis for a random
effects model (details in VER Chapter 21). Here we confine ourselves to the
model on original scale.

MTB > GLM;
SUBC>   Response 'yield';
SUBC>   Nodefault;
SUBC>   Categorical 'repl' 'pestic' 'irrig' 'variety';
SUBC>   Terms repl pestic irrig variety repl*pestic repl*irrig pestic*irrig &
CONT>       pestic*variety irrig*variety repl*pestic*irrig pestic*irrig*variety;
SUBC>   Boxcox;
SUBC>   TMethod;
SUBC>   TAnova;
SUBC>   TSummary;
SUBC>   TCoefficients;
SUBC>   TEquation;
SUBC>   TFactor;
SUBC>   TDiagnostics 0;
SUBC>   Rtype 2;
SUBC>  GFOURPACK.
General Linear Model: yield versus repl, pestic, irrig, variety 

Box-Cox transformation
Rounded lambda               5
Estimated lambda             4.62399
95% CI for lambda            (2.03149, 7.27549)


As noted above, we continue our discussion of the results on original
scale. The variance components are of quite different magnitude, with the 
error variance being the smallest (representing only 1.515/12.52=12% of the
total variance (12.52) without including the repl variance). This means
that all the effects involving variety will be estimated much better
than the other effects. We postpone interpretation of the other variance
components because better estimates will be computed below. 

In the ANOVA table, we focus on the effects involving variety because
these are unaffected by the added model term. All interactions with
variety are clearly non-significant, but there is a strong main effect
of variety. The lsmeans show that variety 2 gives significantly higher
yields than variety 1. Standard errors for varieties are difficult to
compute, and it is preferable to use other software. We can, however,
quite easily compute the SE for the difference between the two varieties
because it is based on the MSE (the within-strip variance):
  SE(ymean_..1.-ymean_..2.) = sqrt(MSE*2/12) = sqrt(1.515/6) = 0.50.
The denominator (12) comes from each of the two means involving an
average of 12 observations.

As part of our model checking and also in order to obtain accurate
inference for the other terms in the model, we next analyze the strip
means. Note that this is an ordinary split-plot model.

MTB > Name c6 "ByVar1" c7 "ByVar2" c8 "ByVar3" c9 "Mean1"
MTB > Statistics 'yield';
SUBC>   By 'repl' 'pestic' 'irrig';
SUBC>   GValues 'ByVar1'-'ByVar3';
SUBC>   Mean 'Mean1'.
MTB > name ByVar1 'repl1'
MTB > name ByVar2 'pestic1'
MTB > name ByVar3 'irrig1'

MTB > GLM;
SUBC>   Response 'Mean1';
SUBC>   Nodefault;
SUBC>   Categorical 'repl1' 'pestic1' 'irrig1';
SUBC>   Random repl1;
SUBC>   Terms repl1 pestic1 irrig1 repl1*pestic1 pestic1*irrig1;
SUBC>   Means pestic1 irrig1;
SUBC>   TExpand;
SUBC>   TMethod;
SUBC>   TAnova;
SUBC>   TSummary;
SUBC>   TCoefficients;
SUBC>   TEquation;
SUBC>   TFactor;
SUBC>   TEMS;
SUBC>   TVariance;
SUBC>   TMeans;
SUBC>   TDiagnostics 0;
SUBC>   Rtype 2;
SUBC>  GFOURPACK.
General Linear Model: Mean1 versus repl1, pestic1, irrig1 

Factor Information
Factor   Type    Levels  Values
repl1    Random       2  1, 2
pestic1  Fixed        3  1, 2, 3
irrig1   Fixed        2  0, 1

Analysis of Variance
Source            DF   Seq SS  Contribution   Adj SS  Adj MS  F-Value  P-Value
  repl1            1   11.702         4.45%   11.702  11.702     0.63    0.511
  pestic1          2  128.138        48.74%  128.138  64.069     3.44    0.225
  irrig1           1   66.977        25.48%   66.977  66.977    20.12    0.021
  repl1*pestic1    2   37.201        14.15%   37.201  18.601     5.59    0.097
  pestic1*irrig1   2    8.874         3.38%    8.874   4.437     1.33    0.385
Error              3    9.986         3.80%    9.986   3.329
Total             11  262.877       100.00%

      S    R-sq  R-sq(adj)   PRESS  R-sq(pred)
1.82443  96.20%     86.07%  159.77      39.22%

Coefficients
Term              Coef  SE Coef       95% CI       T-Value  P-Value   VIF
Constant        57.496    0.527  (55.820, 59.172)   109.17    0.000
repl1
  1              0.988    0.527  (-0.689,  2.664)     1.87    0.157     *
pestic1
  1             -4.621    0.745  (-6.991, -2.250)    -6.20    0.008  1.33
  2              2.254    0.745  (-0.116,  4.625)     3.03    0.056  1.33
irrig1
  0              2.363    0.527  ( 0.686,  4.039)     4.49    0.021  1.00
repl1*pestic1
  1 1            2.362    0.745  (-0.008,  4.733)     3.17    0.050     *
  1 2           -1.863    0.745  (-4.233,  0.508)    -2.50    0.088     *
pestic1*irrig1
  1 0           -0.687    0.745  (-3.058,  1.683)    -0.92    0.424  1.33
  2 0            1.213    0.745  (-1.158,  3.583)     1.63    0.202  1.33

Regression Equation
Mean1 = 57.496 + 0.988 repl1_1 - 0.988 repl1_2 - 4.621 pestic1_1 + 2.254 pestic1_2 + 2.367 pestic1_3
        + 2.363 irrig1_0 - 2.363 irrig1_1 + 2.362 repl1*pestic1_1 1 - 1.863 repl1*pestic1_1 2
        - 0.500 repl1*pestic1_1 3 - 2.362 repl1*pestic1_2 1 + 1.863 repl1*pestic1_2 2
        + 0.500 repl1*pestic1_2 3 - 0.687 pestic1*irrig1_1 0 + 0.687 pestic1*irrig1_1 1
        + 1.213 pestic1*irrig1_2 0 - 1.213 pestic1*irrig1_2 1 - 0.525 pestic1*irrig1_3 0
        + 0.525 pestic1*irrig1_3 1
Equation treats random terms as though they are fixed.

Expected Mean Squares, using Adjusted SS
   Source          Expected Mean Square for Each Term
1  repl1           (6) + 2.0000 (4) + 6.0000 (1)
2  pestic1         (6) + 2.0000 (4) + Q[2, 5]
3  irrig1          (6) + Q[3, 5]
4  repl1*pestic1   (6) + 2.0000 (4)
5  pestic1*irrig1  (6) + Q[5]
6  Error           (6)

Means
Term     Fitted Mean
pestic1
  1          52.8750
  2          59.7500
  3          59.8625
irrig1
  0          59.8583
  1          55.1333

Variance Components, using Adjusted SS
Source          Variance  % of Total    StDev  % of Total
repl1          -1.14979*       0.00%  0.00000       0.00%
repl1*pestic1   7.63604       69.64%  2.76334      83.45%
Error           3.32854       30.36%  1.82443      55.10%
Total           10.9646               3.31128
* Value is negative, and is estimated by zero.
 
Residual Plots for Mean1 

Comments:
---------
We first comment on the residual plots. They look quite ok but also show
sign of strong correlations between residuals, not surprising with
DFE=3.

The variance components for fields is now correctly estimated at 7.636.
The error variance (3.329) can be used to recover the original variance
component for strips (because the formulae for variances of sums and 
means of variance shows it to be equal to sigma_f^2+(sigma^2)/2), but
we won't explore this technical detail. Realistically, if one wanted to
get the correct variance component estimates, one would run the analysis
in another analysis than Minitab. In any case, we see the field
variance component is clearly the largest, and therefore the inference 
for pesticides (the whole-plot factor) is quite weak.

The ANOVA tables gives the correct tests for all remaining terms. Among
these, only the main effect for irrigation is significant, and the
lsmeans show, somewhat surprisingly, the highest yields for the non-irrigated 
strips. It is not clear whether that was an expected result, but the 
difference between irrigated and non-irrigated strips is clearly visible 
in the raw data lising. As above, the standard errors for the irrigation 
means is difficult to compute (although here we can use the formula in 
the notes on linear mixed models), and we may prefer to use other 
statistical software to compute them. We can, however, as above compute 
the SE for the difference between the two irrigation levels because it 
is based on the MSE (the field variance) in the analysis for strip
means:
  SE(ymean_.1..-ymean_.0..) = sqrt(MSE*2/6) = sqrt(3.329/3) = 1.05.
The denominator (6) comes from each of the two means involving an
average of 6 observations (strips). It is seen that this SE is much
larger than the one compute for varieties above.

Finally, the question asks for the SE for comparison of two pesticides.
The notes on linear mixed models explain that all inference about the
whole-plot factor can be based on the whole-plot MS in the table.
Therefore,
  SE(ymean_1...-ymean_2...) = sqrt(MSrepl1*pestic1*2/4) = sqrt(18.601/2) = 3.05.
The denominator (4) comes from each of the two means involving an
average of 4 observations (strips). Again, this SE is even larger than
the two other SEs computed above. Alternatively, we could use the
analysis for the field means.

We should also check assumptions about the field random effects. This
would involve averaging the data to fields and running a 2-way ANOVA
corresponding to the block design for pesticide treatments applied in 2
blocks, and inspecting the residuals. The number of units is so small
that it's almost impossible to detect model deviations. Indeed, the
residual plots do not reveal any obvious problems.

In summary, the analysis demonstrated significant effects of varieties
and of the use of irrigation. The variability at the different levels
implied that effects involving variety and to a lesser degree irrigation
were estimated most precisely.