Solution file for Additional exercise 7.2
-----------------------------------------

Data: measurements of blood glucose levels of 3 rabbits after injection 
with 3 different doses of insulin and done on 3 different days. Notation:
  y_i = blood glucose conc. (mg per 100 ml blood) for i'th sample, i=1,...,9,
or
  y_ijk = blood glucose conc. (mg per 100 ml blood) for rabbit j, after having received
treatment i, at day k,
  i=A,B,C (0,1,2 doses); j=1,2,3; k=1,2,3.
  
The design is a 3x3 Latin square. As measurements with different treatments are
obtained from the same rabbits, the order of the treatments is an issue and is taken as
a blocking factor in the design. This is a safeguard against any spurious treatment 
effects caused by day effects, but does not really take into account carry-over effects
very well (because only certain combinations of treatments accur after each other). It is not
possible to do better in such a small design. 

The statistical model is
    y_i = mu + alpha_trt(i) + beta_rabbit(i) + gamma_day(i) + eps_i,
or
    y_ijk = mu + alpha_i + beta_j + gamma_k + epsilon_ijk,
depending on the chosen notation.

MTB > WOpen "H:\VHM\VHM802\Data_csv\hs07_2.csv";
SUBC>   FType;
SUBC>     CSV;
SUBC>   DecSep;
SUBC>     Period;
SUBC>   Field;
SUBC>     Comma;
SUBC>   TDelimiter;
SUBC>     DoubleQuote.
Retrieving worksheet from file: ‘H:\VHM\VHM802\Data_csv\hs07_2.csv’
Worksheet was saved on 17/02/2011

MTB > GLM;
SUBC>   Response 'glucose';
SUBC>   Nodefault;
SUBC>   Categorical 'day' 'rabbit' 'insulin';
SUBC>   Terms day rabbit insulin;
SUBC>   Means day rabbit insulin;
SUBC>   TExpand;
SUBC>   TMethod;
SUBC>   TAnova;
SUBC>   TSummary;
SUBC>   TCoefficients;
SUBC>   TEquation;
SUBC>   TFactor;
SUBC>   TMeans;
SUBC>   TDiagnostics 0;
SUBC>   Rtype 2;
SUBC>  GFOURPACK.
General Linear Model: glucose versus day, rabbit, insulin 

Method
Factor coding  (-1, 0, +1)

Factor Information
Factor   Type   Levels  Values
day      Fixed       3  1, 2, 3
rabbit   Fixed       3  1, 2, 3
insulin  Fixed       3  A, B, C

Analysis of Variance
Source     DF  Seq SS  Contribution  Adj SS   Adj MS  F-Value  P-Value
  day       2   92.67        14.90%   92.67   46.333     5.56    0.152
  rabbit    2   96.00        15.43%   96.00   48.000     5.76    0.148
  insulin   2  416.67        66.99%  416.67  208.333    25.00    0.038
Error       2   16.67         2.68%   16.67    8.333
Total       8  622.00       100.00%

Model Summary
      S    R-sq  R-sq(adj)  PRESS  R-sq(pred)
2.88675  97.32%     89.28%  337.5      45.74%


Coefficients
Term        Coef  SE Coef       95% CI       T-Value  P-Value   VIF
Constant  46.000    0.962  (41.860, 50.140)    47.80    0.000
day
  1        -4.33     1.36  (-10.19,   1.52)    -3.18    0.086  1.33
  2         1.00     1.36  ( -4.86,   6.86)     0.73    0.539  1.33
rabbit
  1         0.00     1.36  ( -5.86,   5.86)     0.00    1.000  1.33
  2         4.00     1.36  ( -1.86,   9.86)     2.94    0.099  1.33
insulin
  A         8.33     1.36  (  2.48,  14.19)     6.12    0.026  1.33
  B        -0.00     1.36  ( -5.86,   5.86)    -0.00    1.000  1.33

Regression Equation
glucose = 46.000 - 4.33 day_1 + 1.00 day_2 + 3.33 day_3 + 0.00 rabbit_1 + 4.00 rabbit_2
          - 4.00 rabbit_3 + 8.33 insulin_A - 0.00 insulin_B - 8.33 insulin_C

Means
         Fitted
Term       Mean  SE Mean
day
  1       41.67     1.67
  2       47.00     1.67
  3       49.33     1.67
rabbit
  1       46.00     1.67
  2       50.00     1.67
  3       42.00     1.67
insulin
  A       54.33     1.67
  B       46.00     1.67
  C       37.67     1.67
 
Residual Plots for glucose 

Comments:
---------
The analysis shows a significant effect of treatments and non-significant effects of
rabbits and days. Note that the power of the F-tests is rather low with only 2
degrees of freedom for error. Therefore, the significant effect is quite
satisfactory, despite a P-value of only 0.038. The treatment means indicate that
doses of insulin decrease the blood glucose level. 

One question is whether to refit the model without the non-significant factors.
Usually this is not of interest because the partial and sequential sum of
squares coincide (so that results would be unchanged). However, with only 2
degrees of freedom for error, one potential and important advantage could be to
increase the degrees of freedom for error (pooling). In this situation, I would 
not do that because the F-statistics are numerically large and pooling will 
therefore increase our estimate of model variance considerably.

We carry out a test of linearity ("lack of fit") for number of doses in the
full Latin square model.

MTB > Code ("A") 1 ("B") 2 ("C") 3 'insulin';
SUBC>   TSummary;
SUBC>   After.
 
Code 

Summary

Original  Recoded   Number
   Value    Value  of Rows
       A        1        3
       B        2        3
       C        3        3


Source data column   insulin
Recoded data column  Coded insulin

MTB > GLM;
SUBC>   Response 'glucose';
SUBC>   Nodefault;
SUBC>   Continuous 'Coded insulin';
SUBC>   Categorical 'day' 'rabbit';
SUBC>   Unstandardized;
SUBC>   Terms C5 day rabbit;
SUBC>   Means day rabbit;
SUBC>   TExpand;
SUBC>   TMethod;
SUBC>   TAnova;
SUBC>   TSummary;
SUBC>   TCoefficients;
SUBC>   TEquation;
SUBC>   TFactor;
SUBC>   TMeans;
SUBC>   TDiagnostics 0;
SUBC>   Rtype 2;
SUBC>  GFOURPACK.
General Linear Model: glucose versus Coded insulin, day, rabbit 

Factor Information
Factor  Type   Levels  Values
day     Fixed       3  1, 2, 3
rabbit  Fixed       3  1, 2, 3

Analysis of Variance
Source           DF  Seq SS  Contribution  Adj SS   Adj MS  F-Value  P-Value
  Coded insulin   1  416.67        66.99%  416.67  416.667    75.00    0.003
  day             2   92.67        14.90%   92.67   46.333     8.34    0.060
  rabbit          2   96.00        15.43%   96.00   48.000     8.64    0.057
Error             3   16.67         2.68%   16.67    5.556
Total             8  622.00       100.00%

      S    R-sq  R-sq(adj)    PRESS  R-sq(pred)
2.35702  97.32%     92.85%  172.125      72.33%

Coefficients
Term             Coef  SE Coef        95% CI       T-Value  P-Value   VIF
Constant        62.67     2.08  (  56.05,  69.28)    30.15    0.000
Coded insulin  -8.333    0.962  (-11.396, -5.271)    -8.66    0.003  1.00
day
  1             -4.33     1.11  (  -7.87,  -0.80)    -3.90    0.030  1.33
  2              1.00     1.11  (  -2.54,   4.54)     0.90    0.434  1.33
rabbit
  1             -0.00     1.11  (  -3.54,   3.54)    -0.00    1.000  1.33
  2              4.00     1.11  (   0.46,   7.54)     3.60    0.037  1.33

Regression Equation
day  rabbit
1    1       glucose = 58.33 - 8.333 Coded insulin
1    2       glucose = 62.33 - 8.333 Coded insulin
1    3       glucose = 54.33 - 8.333 Coded insulin
2    1       glucose = 63.67 - 8.333 Coded insulin
2    2       glucose = 67.67 - 8.333 Coded insulin
2    3       glucose = 59.67 - 8.333 Coded insulin
3    1       glucose = 66.00 - 8.333 Coded insulin
3    2       glucose = 70.00 - 8.333 Coded insulin
3    3       glucose = 62.00 - 8.333 Coded insulin

Means
        Fitted
Term      Mean  SE Mean
day
  1      41.67     1.36
  2      47.00     1.36
  3      49.33     1.36
rabbit
  1      46.00     1.36
  2      50.00     1.36
  3      42.00     1.36

Covariate      Data Mean  StDev
Coded insulin      2.000  0.866
 
Residual Plots for glucose 


Comments:
---------
The (numerator of the) F-test statistic would now be obtained by subtracting SSEs 
and DFEs from the two models as usual, however in this case the SSEs are equal!
This is because the 3 treatment means are exactly on a straight line. Obviously,
the linear model in this case is well supported by the data. A 95% confidence 
interval for the impact of one added insulin dose on the glucose level is given 
above, and could be computed manually as
   -8.33 +- 3.1824*0.9623 = -8.33 +- 3.06 = (-11.4,-5.3).