Solution file for additional exercise 10.4
------------------------------------------
(see Exercise 10.1 for discussion of model, design and notation)

MTB > WOpen "H:\VHM\VHM802\Data_csv\hs10_4.csv";
SUBC>   FType;
SUBC>     CSV;
SUBC>   DecSep;
SUBC>     Period;
SUBC>   Field;
SUBC>     Comma;
SUBC>   TDelimiter;
SUBC>     DoubleQuote.
Retrieving worksheet from file: ‘H:\VHM\VHM802\Data_csv\hs10_4.csv’
Worksheet was saved on 27/02/2012

MTB > GLM;
SUBC>   Response 'pH';
SUBC>   Nodefault;
SUBC>   Categorical 'strain' 'litter';
SUBC>   Nest litter(strain);
SUBC>   Random litter;
SUBC>   Terms strain litter;
SUBC>   Means strain;
SUBC>   TExpand;
SUBC>   TMethod;
SUBC>   TAnova;
SUBC>   TSummary;
SUBC>   TCoefficients;
SUBC>   TEquation;
SUBC>   TFactor;
SUBC>   TEMS;
SUBC>   TVariance;
SUBC>   TMeans;
SUBC>   TDiagnostics 0;
SUBC>   Rtype 2;
SUBC>  GFOURPACK.
General Linear Model: pH versus strain, litter 

Method
Factor coding  (-1, 0, +1)

Factor Information
Factor          Type    Levels  Values
strain          Fixed        2  pHH, pHL
litter(strain)  Random      14  1(pHH), 2(pHH), 3(pHH), 4(pHH), 5(pHH), 6(pHH), 7(pHH), 1(pHL),
                                2(pHL), 3(pHL), 4(pHL), 5(pHL), 6(pHL), 7(pHL)

Analysis of Variance
Source            DF    Seq SS  Contribution    Adj SS    Adj MS  F-Value  P-Value
  strain           1  0.006645         3.93%  0.006645  0.006645     1.26    0.283
  litter(strain)  12  0.063093        37.31%  0.063093  0.005258     2.22    0.028
Error             42  0.099375        58.76%  0.099375  0.002366
Total             55  0.169113       100.00%

Model Summary
        S    R-sq  R-sq(adj)     PRESS  R-sq(pred)
0.0486423  41.24%     23.05%  0.176667       0.00%

Coefficients
Term               Coef  SE Coef         95% CI        T-Value  P-Value   VIF
Constant        7.46625  0.00650  ( 7.45313, 7.47937)  1148.64    0.000
strain
  pHH           0.01089  0.00650  (-0.00222, 0.02401)     1.68    0.101  1.00
litter(strain)
  1(pHH)        -0.0296   0.0225  ( -0.0751,  0.0158)    -1.32    0.195     *
  2(pHH)        -0.0021   0.0225  ( -0.0476,  0.0433)    -0.10    0.925     *
  3(pHH)         0.0554   0.0225  (  0.0099,  0.1008)     2.46    0.018     *
  4(pHH)        -0.0346   0.0225  ( -0.0801,  0.0108)    -1.54    0.131     *
  5(pHH)        -0.0221   0.0225  ( -0.0676,  0.0233)    -0.98    0.331     *
  6(pHH)         0.0029   0.0225  ( -0.0426,  0.0483)     0.13    0.900     *
  1(pHL)        -0.0279   0.0225  ( -0.0733,  0.0176)    -1.24    0.223     *
  2(pHL)        -0.0579   0.0225  ( -0.1033, -0.0124)    -2.57    0.014     *
  3(pHL)        -0.0179   0.0225  ( -0.0633,  0.0276)    -0.79    0.432     *
  4(pHL)         0.0271   0.0225  ( -0.0183,  0.0726)     1.21    0.235     *
  5(pHL)         0.0346   0.0225  ( -0.0108,  0.0801)     1.54    0.131     *
  6(pHL)        -0.0104   0.0225  ( -0.0558,  0.0351)    -0.46    0.648     *

Regression Equation
pH = 7.46625 + 0.01089 strain_pHH - 0.01089 strain_pHL - 0.0296 litter(strain)_1(pHH)
     - 0.0021 litter(strain)_2(pHH) + 0.0554 litter(strain)_3(pHH) - 0.0346 litter(strain)_4(pHH)
     - 0.0221 litter(strain)_5(pHH) + 0.0029 litter(strain)_6(pHH) + 0.0304 litter(strain)_7(pHH)
     - 0.0279 litter(strain)_1(pHL) - 0.0579 litter(strain)_2(pHL) - 0.0179 litter(strain)_3(pHL)
     + 0.0271 litter(strain)_4(pHL) + 0.0346 litter(strain)_5(pHL) - 0.0104 litter(strain)_6(pHL)
     + 0.0521 litter(strain)_7(pHL)
Equation treats random terms as though they are fixed.

Fits and Diagnostics for Unusual Observations
Obs      pH     Fit  SE Fit       95% CI         Resid  Std Resid  Del Resid    HI  Cook’s D
  2  7.3900  7.4750  0.0243  (7.4259, 7.5241)  -0.0850      -2.02      -2.10  0.25      0.10
 31  7.6300  7.5325  0.0243  (7.4834, 7.5816)   0.0975       2.31       2.45  0.25      0.13
 46  7.5500  7.4425  0.0243  (7.3934, 7.4916)   0.1075       2.55       2.74  0.25      0.16
Obs     DFITS
  2  -1.21122  R
 31   1.41350  R
 46   1.58364  R
R  Large residual

Expected Mean Squares, using Adjusted SS
                   Expected Mean Square
   Source          for Each Term
1  strain          (3) + 4.0000 (2) + Q[1]
2  litter(strain)  (3) + 4.0000 (2)
3  Error           (3)

Means
Term    Fitted Mean
strain
  pHH       7.47714
  pHL       7.45536

Variance Components, using Adjusted SS
Source           Variance  % of Total      StDev  % of Total
litter(strain)  0.0007229      23.40%  0.0268871      48.38%
Error           0.0023661      76.60%  0.0486423      87.52%
Total           0.0030890              0.0555787
 
Residual Plots for pH 

Comments and answers to questions:
----------------------------------
The residuals of the error terms look okay, and the most extreme
residual has a corresponding deletion residual of 2.74, which is no
cause of concern in a dataset of this size. We discuss the residuals of
litter random effects below.

The mean strain levels are given above as least square means, and the
standard errors could be computed by the usual formulae except for using MS(Litt)
instead of MSE: SE = sqrt(MS(Litt)/28) = 0.014

The estimated variance component are listed above as well:
  sigma^2 (error):	0.00237
  sigma^2_B (litters):	0.00072
There appears to be much more variation among mice (within litters) than 
between litters.

The ANOVA table shows weak effects of both litters and strains. In
particular, the F-statistic for testing no strain effect is only 1.26,
with a P-value of 0.28. Therefore, the breeding does not really seem to
have been successful (at least, the pHH strain has a higher blood-pH mean 
than the pHL strain, but the message is that this could very well be caused
by random fluctuations). Note that we have no interest in dropping the
litter effects from the model, despite their non-significance, because
the litters are a substantial part of the study design and data
structure.

Next we give commands to compute residuals at the litter level, or 
estimated random litter effects. Essentially, we aggregate the data
within litters, and analyze the litter means.

MTB > Name c5 "ByVar1" c6 "ByVar2" c7 "Mean1"
MTB > Statistics 'pH';
SUBC>   By 'strain' 'litter';
SUBC>   GValues 'ByVar1'-'ByVar2';
SUBC>   Mean 'Mean1'.

MTB > GLM;
SUBC>   Response 'Mean1';
SUBC>   Nodefault;
SUBC>   Categorical 'ByVar1';
SUBC>   Terms ByVar1;
SUBC>   Means ByVar1;
SUBC>   TExpand;
SUBC>   TMethod;
SUBC>   TAnova;
SUBC>   TSummary;
SUBC>   TCoefficients;
SUBC>   TEquation;
SUBC>   TFactor;
SUBC>   TMeans;
SUBC>   TDiagnostics 0;
SUBC>   Rtype 2;
SUBC>  GFOURPACK.
General Linear Model: Mean1 versus ByVar1 

Factor Information
Factor  Type   Levels  Values
ByVar1  Fixed       2  pHH, pHL

Analysis of Variance
Source    DF    Seq SS  Contribution    Adj SS    Adj MS  F-Value  P-Value
  ByVar1   1  0.001661         9.53%  0.001661  0.001661     1.26    0.283
Error     12  0.015773        90.47%  0.015773  0.001314
Total     13  0.017434       100.00%

        S   R-sq  R-sq(adj)      PRESS  R-sq(pred)
0.0362551  9.53%      1.99%  0.0214691       0.00%

Coefficients
Term         Coef  SE Coef         95% CI        T-Value  P-Value   VIF
Constant  7.46625  0.00969  ( 7.44514, 7.48736)   770.54    0.000
ByVar1
  pHH     0.01089  0.00969  (-0.01022, 0.03200)     1.12    0.283  1.00

Regression Equation
Mean1 = 7.46625 + 0.01089 ByVar1_pHH - 0.01089 ByVar1_pHL

Means
        Fitted
Term      Mean  SE Mean
ByVar1
  pHH   7.4771   0.0137
  pHL   7.4554   0.0137
 
Residual Plots for Mean1 

Comments:
---------
The ANOVA table gives the same F-test for strains as the full analysis
above (and this should be so). Also, the standard errors of the strain
least squares means are exactly the ones calculated above. The residuals
in this analysis show no particular problems - neither different variation
for the two strains, nor departures from normality.