Solution file for additional exercise 10.10
-------------------------------------------
Data on antibiotic blood serum levels which during a pilot trial were measured for 
5 subjects at 1, 2, 3 and 6 hours after medication. Each subject went through two
measurement periods, each with a different drug, and with a wash-out period in-between.

- notation:
y_ijk = antibiotic level time for subject i with drug j and measured at time k,
   i = 1,2,3,4,5, (subjects),
   j = 1,2 (drug: A, B),
   k = 1,2,3,4 (hours after medication: 1, 2, 3, 6,
- repeated measures data with 2 series of 4 measurements on each subject,
- the treatment factor (drug) varies within subjects, therefore the
design does not have split-plot character (no whole-plot factor),
- may at first sight be viewed as a block design with
  * drugs & time = treatment factors,
  * subjects = blocks,
however this leaves out one important effect in the model: the subject*drug
interaction, corresponding to measurement periods for each of the
subjects; in fact, the repeated measures are taken over drug*subject
units,
- model: 
y_ijk = mu + A_i + beta_j + AB_ij + gamma_k + (beta gamma)_jk + eps_ijk, 
      where A_i's are assumed i.i.d. N(0,sigma_A^2),
      where AB_ij's are assumed i.i.d. N(0,sigma_AB^2),
      where eps_ijk's are assumed i.i.d. N(0,sigma^2),
we take here subject effects as random because there could be some
interest in a variation between subjects.

Answers to questions:
- experimental design: repeated measures with treatments within
subjects, may be viewed as a block design with subject*drugs as blocks,
(which does however ignore the ordering of measurements over time),
- effects of interest: drug, drug*time, drug*subject,
- experimental unit for drug treatment: single measurement or period 
(NOT subject because drugs are compared within subjects),
- single measurement over time: 2-way layout with treatments and blocks
(2*5 design).

MTB > WOpen "h:\vhm\vhm802\data_csv\hs10_10.csv";
SUBC>   FType;
SUBC>     CSV;
SUBC>   DecSep;
SUBC>     Period;
SUBC>   Field;
SUBC>     Comma;
SUBC>   TDelimiter;
SUBC>     DoubleQuote.
Retrieving worksheet from file: 'h:\vhm\vhm802\data_csv\hs10_10.csv'
Worksheet was saved on 19/03/2011

MTB > Plot 'y'*'time';
SUBC>   Symbol 'subject';
SUBC>   Connect 'subject';
SUBC>   Panel 'drug'.
Scatterplot of y vs time 

MTB > Name c5 "SRES1" c6 "TRES1"
MTB > GLM  'y' = drug|time subject subject*drug;
SUBC>   Random 'subject';
SUBC>   Brief 2 ;
SUBC>   EMS;
SUBC>   SResiduals 'SRES1';
SUBC>   TResiduals 'TRES1';
SUBC>   GFourpack;
SUBC>   RType 2 .
General Linear Model: y versus drug, time, subject 

Factor   Type    Levels  Values
drug     fixed        2  A, B
time     fixed        4  1, 2, 3, 6
subject  random       5  1, 2, 3, 4, 5

Analysis of Variance for y, using Adjusted SS for Tests

Source        DF   Seq SS  Adj SS  Adj MS      F      P
drug           1   0.0497  0.0497  0.0497   0.08  0.789
time           3   3.2716  3.2716  1.0905  10.85  0.000
drug*time      3   0.0988  0.0988  0.0329   0.33  0.805
subject        4   4.4351  4.4351  1.1088   1.82  0.288
drug*subject   4   2.4365  2.4365  0.6091   6.06  0.002
Error         24   2.4125  2.4125  0.1005
Total         39  12.7043

S = 0.317053   R-Sq = 81.01%   R-Sq(adj) = 69.14%

Unusual Observations for y

Obs        y      Fit   SE Fit  Residual  St Resid
 29  0.32000  0.95100  0.20052  -0.63100     -2.57 R
 30  2.12000  1.62500  0.20052   0.49500      2.02 R
 37  1.48000  0.59100  0.20052   0.88900      3.62 R

R denotes an observation with a large standardized residual.

Expected Mean Squares, using Adjusted SS

   Source        Expected Mean Square for Each Term
1  drug          (6) + 4.0000 (5) + Q[1, 3]
2  time          (6) + Q[2, 3]
3  drug*time     (6) + Q[3]
4  subject       (6) + 4.0000 (5) + 8.0000 (4)
5  drug*subject  (6) + 4.0000 (5)
6  Error         (6)

Error Terms for Tests, using Adjusted SS
                                     Synthesis
   Source        Error DF  Error MS  of Error MS
1  drug              4.00    0.6091  (5)
2  time             24.00    0.1005  (6)
3  drug*time        24.00    0.1005  (6)
4  subject           4.00    0.6091  (5)
5  drug*subject     24.00    0.1005  (6)

Variance Components, using Adjusted SS

              Estimated
Source            Value
subject         0.06246
drug*subject    0.12715
Error           0.10052
 

Comments:
---------
The residual plots show a very strong outlier: observation 37, which is
the first measurement for subject 5 with drug B. It is the highest in
that series, contrasting all other series which peak after 1 hour. Also,
the value is higher than all values for subject 5 with drug A. The
P-value computed from the deletion residual of 5.26 is about 0.001. 
We decide to remove that observation and rerun the analysis.
Before continuing, we note that the ANOVA table shows no effects
whatsoever of drug or drug*time. There is some effect of drug*subject,
indicating the importance of the drug*subject variation in the data.

MTB > Copy 'y' c7;
SUBC>   Varnames.
MTB > let c7(37)='*'
MTB > Name c8 "SRES2" c9 "TRES2"
MTB > GLM  'y_1' = drug|time subject subject*drug;
SUBC>   Random 'subject';
SUBC>   Brief 2 ;
SUBC>   EMS;
SUBC>   SResiduals 'SRES2';
SUBC>   TResiduals 'TRES2';
SUBC>   GFourpack;
SUBC>   RType 2 .
General Linear Model: y_1 versus drug, time, subject 

Factor   Type    Levels  Values
drug     fixed        2  A, B
time     fixed        4  1, 2, 3, 6
subject  random       5  1, 2, 3, 4, 5

Analysis of Variance for y_1, using Adjusted SS for Tests

Source        DF    Seq SS   Adj SS   Adj MS      F      P
drug           1   0.03096  0.00012  0.00012   0.00  0.989 x
time           3   3.51160  3.99752  1.33251  27.98  0.000
drug*time      3   0.18907  0.35886  0.11962   2.51  0.084
subject        4   5.28793  5.36567  1.34142   2.15  0.238
drug*subject   4   2.49481  2.49481  0.62370  13.10  0.000
Error         23   1.09534  1.09534  0.04762
Total         38  12.60971

x Not an exact F-test.

S = 0.218228   R-Sq = 91.31%   R-Sq(adj) = 85.65%

Unusual Observations for y_1

Obs      y_1      Fit   SE Fit  Residual  St Resid
 13  0.62000  0.14125  0.14434   0.47875      2.93 R
 29  0.32000  0.72875  0.14434  -0.40875     -2.50 R
 30  2.12000  1.69908  0.13874   0.42092      2.50 R

R denotes an observation with a large standardized residual.

Expected Mean Squares, using Adjusted SS

   Source        Expected Mean Square for Each Term
1  drug          (6) + 3.8400 (5) + Q[1, 3]
2  time          (6) + Q[2, 3]
3  drug*time     (6) + Q[3]
4  subject       (6) + 3.8571 (5) + 7.7143 (4)
5  drug*subject  (6) + 3.8571 (5)
6  Error         (6)

Error Terms for Tests, using Adjusted SS

   Source        Error DF  Error MS  Synthesis of Error MS
1  drug              4.00   0.62114  0.9956 (5) + 0.0044 (6)
2  time             23.00   0.04762  (6)
3  drug*time        23.00   0.04762  (6)
4  subject           4.00   0.62370  (5)
5  drug*subject     23.00   0.04762  (6)

Variance Components, using Adjusted SS

              Estimated
Source            Value
subject         0.09304
drug*subject    0.14935
Error           0.04762

MTB > NormTest 'SRES2'.
Probability Plot of SRES2 
The P-value for the W-test is 0.032.

Comments:
---------
The model without observation 37 has no longer any strong residuals, but
the residual plots looks strange, possibly indicating a right-skewed
distribution. Quite amazingly, the drug*time effect is now close to
significance, at a P-value of 0.084. The effect lies in the comparison
between drugs at 1 hour, where now - after the removal of obs. 37 - drug
B lies lower than drug A. Obviously, this conclusion must be taken with 
some reservation, by the strong dependence on the removal of obs. 37.

For this model, one may try alternative correlation structures for the
repeated measures on the subjects (see SAS and Stata analyses). The results 
show that assuming equal correlation among all time points is actually a
quite good assumption for these data.

As the residuals were not yet fully satisfactory, we decide to look for a
suitable transformation. A Box-Cox analysis in Stata based on the fixed
effects version of the model gives optimal powers around 0.5 (0.61 for
the full data, 0.57 without observation 37). We therefore try a 
square-root transformation. It turns out that for the square-root
transformed outcome there are two strong outliers: observations 29 and
37. A third Box-Cox analysis without both of these observations gives an
optimal power of 0.16, and no evidence against a log-transformation. To complete 
the analysis, we also try a log-transformation of the outcome, without
both of these extreme outliers.

MTB > Name C5 'lny'
MTB > Let 'lny' = ln('y')
MTB > let c5(29)='*'
MTB > let c5(37)='*'
MTB > Name c6 "TRES1"
MTB > GLM  'lny' = drug| time subject subject*drug;
SUBC>   Random 'subject';
SUBC>   Brief 2 ;
SUBC>   EMS;
SUBC>   Means time;
SUBC>   TResiduals 'TRES1';
SUBC>   GFourpack;
SUBC>   RType 2 .
General Linear Model: lny versus drug, time, subject 

Factor   Type    Levels  Values
drug     fixed        2  A, B
time     fixed        4  1, 2, 3, 6
subject  random       5  1, 2, 3, 4, 5

Analysis of Variance for lny, using Adjusted SS for Tests

Source        DF   Seq SS   Adj SS   Adj MS      F      P
drug           1  0.00561  0.00446  0.00446   0.01  0.926 x
time           3  2.27575  2.38912  0.79637  47.45  0.000
drug*time      3  0.02591  0.04857  0.01619   0.96  0.427
subject        4  3.69145  3.62168  0.90542   1.93  0.270
drug*subject   4  1.87426  1.87426  0.46856  27.92  0.000
Error         22  0.36923  0.36923  0.01678
Total         37  8.24220

x Not an exact F-test.

S = 0.129550   R-Sq = 95.52%   R-Sq(adj) = 92.47%

Unusual Observations for lny

Obs        lny        Fit    SE Fit   Residual  St Resid
 13  -0.478036  -0.722643  0.091606   0.244607      2.67 R
 21  -0.430783  -0.237701  0.091606  -0.193082     -2.11 R

R denotes an observation with a large standardized residual.

Expected Mean Squares, using Adjusted SS

   Source        Expected Mean Square for Each Term
1  drug          (6) + 3.6000 (5) + Q[1, 3]
2  time          (6) + Q[2, 3]
3  drug*time     (6) + Q[3]
4  subject       (6) + 3.7143 (5) + 7.4286 (4)
5  drug*subject  (6) + 3.7143 (5)
6  Error         (6)

Error Terms for Tests, using Adjusted SS

   Source        Error DF  Error MS  Synthesis of Error MS
1  drug              4.01   0.45466  0.9692 (5) + 0.0308 (6)
2  time             22.00   0.01678  (6)
3  drug*time        22.00   0.01678  (6)
4  subject           4.00   0.46856  (5)
5  drug*subject     22.00   0.01678  (6)

Variance Components, using Adjusted SS

              Estimated
Source            Value
subject         0.05881
drug*subject    0.12163
Error           0.01678

Least Squares Means for lny

time     Mean
1     -0.1947
2      0.4026
3      0.1667
6     -0.1956


Comments:
---------
After removal of two outlying observations the residuals look acceptable. 
The second removed observation was the first value for drug B of subject 4.
This value was unexpectedly low, and by now having removed both one
unexpectedly high and one unexpectedly low observation there is no
longer indication of significance for the drug*time interaction. We
conclude that there is no sign of difference between the drugs.

With the non-significant drug effects we do not need to worry about the possible 
effects of violation of the assumptions in the model from repeated measures over 
time, because those would tend to increase our P-values even more (if the data
do not show sphericity). Analyses in SAS and Stata confirm that our conclusions 
do not change when taking the repeated measures into account.

Analysis at single time points does (not surprisingly) show absolutely
no difference between drugs at any time.

Addendum:
---------
The above analysis did not include checks of the distribution of subject
random effects. Averaging over subject and drug, and analyzing these
means in a two-way ANOVA shows that there are no problems with the
subject random effects, neither for the untransformed nor for the
transformed data (results not shown).