Solution file for additional exercise 10.2
------------------------------------------
(see additional exercise 10.1 for discussion of model, design and notation)

MTB > WOpen "h:\vhm\vhm802\data_csv\hs10_2.csv";
SUBC>   FType;
SUBC>     CSV;
SUBC>   DecSep;
SUBC>     Period;
SUBC>   Field;
SUBC>     Comma;
SUBC>   TDelimiter;
SUBC>     DoubleQuote.
Retrieving worksheet from file: 'h:\vhm\vhm802\data_csv\hs10_2.csv'
Worksheet was saved on 03/03/2011

MTB > GLM 'moisture' = lot;
SUBC>   Random 'lot';
SUBC>   Brief 3 ;
SUBC>   EMS;
SUBC>   GFourpack;
SUBC>   RType 2 .
General Linear Model: moisture versus lot 

Factor  Type    Levels  Values
lot     random       3  1, 2, 3

Analysis of Variance for moisture, using Adjusted SS for Tests

Source  DF   Seq SS   Adj SS  Adj MS      F      P
lot      2  12.8958  12.8958  6.4479  92.80  0.002
Error    3   0.2084   0.2084  0.0695
Total    5  13.1043

S = 0.263597   R-Sq = 98.41%   R-Sq(adj) = 97.35%

Term         Coef  SE Coef       T      P
Constant  36.8983   0.1076  342.88  0.000
lot
1          2.0417   0.1522   13.42  0.001
2         -1.3333   0.1522   -8.76  0.003

Expected Mean Squares, using Adjusted SS

           Expected Mean
           Square for Each
   Source  Term
1  lot     (2) + 2.0000 (1)
2  Error   (2)

...

Variance Components, using Adjusted SS

        Estimated
Source      Value
lot       3.18922
Error     0.06948
 
Residual Plots for moisture 

MTB > Name c3 "ByVar1" c4 "Mean1"
MTB > Statistics 'moisture';
SUBC>   By 'lot';
SUBC>   GValues 'ByVar1';
SUBC>   Mean 'Mean1'.
MTB > NormTest 'Mean1'.
Probability Plot of Mean1
The P-value of the Anderson-Darling test for normality is 0.227

MTB > Oneway 'moisture' 'lot'.
One-way ANOVA: moisture versus lot 

Source  DF       SS      MS      F      P
lot      2  12.8958  6.4479  92.80  0.002
Error    3   0.2085  0.0695
Total    5  13.1043

S = 0.2636   R-Sq = 98.41%   R-Sq(adj) = 97.35%

                         Individual 95% CIs For Mean Based on
                         Pooled StDev
Level  N    Mean  StDev  ---------+---------+---------+---------+
1      2  38.940  0.057                               (---*----)
2      2  35.565  0.021  (----*----)
3      2  36.190  0.453        (----*----)
                         ---------+---------+---------+---------+
                               36.0      37.2      38.4      39.6
Pooled StDev = 0.264


Comments and answers to questions:
----------------------------------
The model assumes the same variability for all lots, but actually two of
the lots show very small variation between the two replicates whereas
the third has much larger variation. With such a small dataset, it is
quite impossible to check the assumption, so it is essentially the
responsibility of the analyst that it is meaningful. Also the normality
assumption for the random effects can hardly be checked realistically
(even though the procedure was included above).

Variance component parameter estimates are given above:
  sigma^2  : 0.069
  sigma_A^2: 3.19
Obviously, there is much larger variation between lots than within lots
(and accordingly the F-statistic to test no differences between lots is
clearly significant, despite the small dataset).

The remaining parameter of the model is the overall mean:
  mu: 36.90
The standard error must be computed using the MS(Lot) instead of MSE,
that is,
  SE(mean) = sqrt(MS(Lot)/6) = 1.04,
and the corresponding 95% confidence interval is
  mu: 36.90 +- t(2,.975)*1.04 = 36.90 +- 4.3027*1.04 = 36.90 +- 4.46
Note that the Minitab print gives the wrong standard error (corresponding
to a fixed effects model).

Analysis of the 1-way ANOVA with fixed effects would give the SAME ANOVA
table, but different mean parameters to estimate: one mean for each lot.
Parametrizing the model as: y_ij = mu + alpha_i + eps_ij, with the
(Minitab) restriction that alpha_1+alpha_2+alpha_3=0, gives the
following estimate and standard error for mu:
  mu: 36.90, SE(mean) = sqrt(MSE/6) = 0.108.
The standard error is much smaller, because it takes only into account
the variation within lots, not the variation between lots. The parameter
mu has an interpretation as the overall mean FOR THESE 3 LOTS, whereas
in the random effects model it is the overall mean IN THE POPULATION OF
LOTS, from which the 3 lots are sampled. Due to the large variation
between lots, the latter is determined with much larger error than the
former.

Note that analyses by the xtmixed command in Stata and proc mixed in SAS give
the correct estimate and SE for mu in the random effects model. The
Stata analysis uses a standard normal distribution as the reference
distribution for inference about mu, whereas SAS uses a t-distribution
with 2 df; the latter is preferable.