Solution file for additional exercise 10.2
------------------------------------------
(see additional exercise 10.1 for discussion of model, design and notation)

MTB > WOpen "H:\VHM\VHM802\Data_csv\hs10_2.csv";
SUBC>   FType;
SUBC>     CSV;
SUBC>   DecSep;
SUBC>     Period;
SUBC>   Field;
SUBC>     Comma;
SUBC>   TDelimiter;
SUBC>     DoubleQuote.
Retrieving worksheet from file: ‘H:\VHM\VHM802\Data_csv\hs10_2.csv’
Worksheet was saved on 03/03/2011

MTB > GLM;
SUBC>   Response 'moisture';
SUBC>   Nodefault;
SUBC>   Categorical 'lot';
SUBC>   Random lot;
SUBC>   Terms lot;
SUBC>   TExpand;
SUBC>   TMethod;
SUBC>   TAnova;
SUBC>   TSummary;
SUBC>   TCoefficients;
SUBC>   TEquation;
SUBC>   TFactor;
SUBC>   TEMS;
SUBC>   TVariance;
SUBC>   TDiagnostics 0;
SUBC>   Rtype 2;
SUBC>  GFOURPACK.
General Linear Model: moisture versus lot 

Method
Factor coding  (-1, 0, +1)

Factor Information
Factor  Type    Levels  Values
lot     Random       3  1, 2, 3

Analysis of Variance
Source  DF   Seq SS  Contribution   Adj SS   Adj MS  F-Value  P-Value
  lot    2  12.8958        98.41%  12.8958  6.44792    92.80    0.002
Error    3   0.2084         1.59%   0.2084  0.06948
Total    5  13.1043       100.00%

Model Summary
       S    R-sq  R-sq(adj)   PRESS  R-sq(pred)
0.263597  98.41%     97.35%  0.8338      93.64%


Coefficients
Term        Coef  SE Coef       95% CI       T-Value  P-Value
Constant  36.898    0.108  (36.556, 37.241)   342.88    0.000
lot
  1        2.042    0.152  ( 1.557,  2.526)    13.42    0.001
  2       -1.333    0.152  (-1.818, -0.849)    -8.76    0.003

Regression Equation
moisture = 36.898 + 2.042 lot_1 - 1.333 lot_2 - 0.708 lot_3
Equation treats random terms as though they are fixed.

Expected Mean Squares, using Adjusted SS
           Expected Mean Square
   Source  for Each Term
1  lot     (2) + 2.0000 (1)
2  Error   (2)
...

Variance Components, using Adjusted SS
Source   Variance  % of Total    StDev  % of Total
lot       3.18922      97.87%  1.78584      98.93%
Error   0.0694833       2.13%  0.26360      14.60%
Total      3.2587              1.80519
 
Residual Plots for moisture 

MTB > Name c3 "ByVar1" c4 "Mean1"
MTB > Statistics 'moisture';
SUBC>   By 'lot';
SUBC>   GValues 'ByVar1';
SUBC>   Mean 'Mean1'.
MTB > PPlot 'Mean1';
SUBC>   Normal;
SUBC>   Symbol;
SUBC>   FitD;
SUBC>   Grid 2;
SUBC>   Grid 1;
SUBC>   MGrid 1.
Probability Plot of Mean1 
The P-value of the Anderson-Darling test of normality is 0.227.

MTB > Describe 'moisture';
SUBC>   By 'lot';
SUBC>   Mean;
SUBC>   StDeviation;
SUBC>   Minimum;
SUBC>   Maximum;
SUBC>   N.
Descriptive Statistics: moisture 

Variable  lot  N    Mean   StDev  Minimum  Maximum
moisture  1    2  38.940  0.0566   38.900   38.980
          2    2  35.565  0.0212   35.550   35.580
          3    2  36.190   0.453   35.870   36.510


Comments and answers to questions:
----------------------------------
The model assumes the same variability for all lots, but actually two of
the lots show very small variation between the two replicates whereas
the third has much larger variation. With such a small dataset, it is
quite impossible to check the assumption, so it is essentially the
responsibility of the analyst that it is meaningful. Also the normality
assumption for the random effects can hardly be checked realistically
(even though the procedure was included above).

Variance component parameter estimates are given above:
  sigma^2  : 0.069
  sigma_A^2: 3.19
Obviously, there is much larger variation between lots than within lots
(and accordingly the F-statistic to test no differences between lots is
clearly significant, despite the small dataset).

The remaining parameter of the model is the overall mean:
  mu: 36.90
The standard error must be computed using the MS(Lot) instead of MSE,
that is,
  SE(mean) = sqrt(MS(Lot)/6) = 1.04,
and the corresponding 95% confidence interval is
  mu: 36.90 +- t(2,.975)*1.04 = 36.90 +- 4.3027*1.04 = 36.90 +- 4.46.

Analysis of the 1-way ANOVA with fixed effects would give the SAME ANOVA
table, but different mean parameters to estimate: one mean for each lot.
Parametrizing the model as: y_ij = mu + alpha_i + eps_ij, with the
(Minitab) restriction that alpha_1+alpha_2+alpha_3=0, gives the
following estimate and standard error for mu (in the Minitab listing
above):
  mu: 36.90, SE(mean) = sqrt(MSE/6) = 0.108.
The standard error is much smaller, because it takes only into account
the variation within lots, not the variation between lots. The parameter
mu has an interpretation as the overall mean FOR THESE 3 LOTS, whereas
in the random effects model it is the overall mean IN THE POPULATION OF
LOTS, from which the 3 lots are sampled. Due to the large variation
between lots, the latter is determined with much larger error than the
former.

Note that the Minitab print gave the wrong standard error and confidence
omterval (corresponding to the fixed effects model); there is no way to
get the correct values in the software.

Analyses by the mixed command in Stata, proc mixed in SAS, as well as 
the nlme and lme4 libraries in R give the correct estimate and SE 
for mu in the random effects model. The Stata analysis uses a 
standard normal distribution as the reference distribution for inference 
about mu, whereas all the other methods use a more exact t-distribution 
with 2 df or 3 df.