Solution file for additional exercise 10.10 ------------------------------------------- Data on antibiotic blood serum levels which during a pilot trial were measured for 5 subjects at 1, 2, 3 and 6 hours after medication. Each subject went through two measurement periods, each with a different drug, and with a wash-out period in-between. - notation: y_ijk = antibiotic level time for subject i with drug j and measured at time k, i = 1,2,3,4,5, (subjects), j = 1,2 (drug: A, B), k = 1,2,3,4 (hours after medication: 1, 2, 3, 6, - repeated measures data with 2 series of 4 measurements on each subject, - the treatment factor (drug) varies within subjects, therefore the design does not have split-plot character (no whole-plot factor), - may at first sight be viewed as a block design with * drugs & time = treatment factors, * subjects = blocks, however this leaves out one important effect in the model: the subject*drug interaction, corresponding to measurement periods for each of the subjects; in fact, the repeated measures are taken over drug*subject units, - model: y_ijk = mu + A_i + beta_j + AB_ij + gamma_k + (beta gamma)_jk + eps_ijk, where A_i's are assumed i.i.d. N(0,sigma_A^2), where AB_ij's are assumed i.i.d. N(0,sigma_AB^2), where eps_ijk's are assumed i.i.d. N(0,sigma^2), we take here subject effects as random because there could be some interest in a variation between subjects. Answers to questions: - experimental design: repeated measures with treatments within subjects, may be viewed as a block design with subject*drugs as blocks, (which does however ignore the ordering of measurements over time), - effects of interest: drug, drug*time, drug*subject, - experimental unit for drug treatment: single measurement or period (NOT subject because drugs are compared within subjects), - single measurement over time: 2-way layout with treatments and blocks (2*5 design). MTB > WOpen "h:\vhm\vhm802\data_csv\hs10_10.csv"; SUBC> FType; SUBC> CSV; SUBC> DecSep; SUBC> Period; SUBC> Field; SUBC> Comma; SUBC> TDelimiter; SUBC> DoubleQuote. Retrieving worksheet from file: 'h:\vhm\vhm802\data_csv\hs10_10.csv' Worksheet was saved on 19/03/2011 MTB > Plot 'y'*'time'; SUBC> Symbol 'subject'; SUBC> Connect 'subject'; SUBC> Panel 'drug'. Scatterplot of y vs time MTB > Name c5 "SRES1" c6 "TRES1" MTB > GLM 'y' = drug|time subject subject*drug; SUBC> Random 'subject'; SUBC> Brief 2 ; SUBC> EMS; SUBC> SResiduals 'SRES1'; SUBC> TResiduals 'TRES1'; SUBC> GFourpack; SUBC> RType 2 . General Linear Model: y versus drug, time, subject Factor Type Levels Values drug fixed 2 A, B time fixed 4 1, 2, 3, 6 subject random 5 1, 2, 3, 4, 5 Analysis of Variance for y, using Adjusted SS for Tests Source DF Seq SS Adj SS Adj MS F P drug 1 0.0497 0.0497 0.0497 0.08 0.789 time 3 3.2716 3.2716 1.0905 10.85 0.000 drug*time 3 0.0988 0.0988 0.0329 0.33 0.805 subject 4 4.4351 4.4351 1.1088 1.82 0.288 drug*subject 4 2.4365 2.4365 0.6091 6.06 0.002 Error 24 2.4125 2.4125 0.1005 Total 39 12.7043 S = 0.317053 R-Sq = 81.01% R-Sq(adj) = 69.14% Unusual Observations for y Obs y Fit SE Fit Residual St Resid 29 0.32000 0.95100 0.20052 -0.63100 -2.57 R 30 2.12000 1.62500 0.20052 0.49500 2.02 R 37 1.48000 0.59100 0.20052 0.88900 3.62 R R denotes an observation with a large standardized residual. Expected Mean Squares, using Adjusted SS Source Expected Mean Square for Each Term 1 drug (6) + 4.0000 (5) + Q[1, 3] 2 time (6) + Q[2, 3] 3 drug*time (6) + Q[3] 4 subject (6) + 4.0000 (5) + 8.0000 (4) 5 drug*subject (6) + 4.0000 (5) 6 Error (6) Error Terms for Tests, using Adjusted SS Synthesis Source Error DF Error MS of Error MS 1 drug 4.00 0.6091 (5) 2 time 24.00 0.1005 (6) 3 drug*time 24.00 0.1005 (6) 4 subject 4.00 0.6091 (5) 5 drug*subject 24.00 0.1005 (6) Variance Components, using Adjusted SS Estimated Source Value subject 0.06246 drug*subject 0.12715 Error 0.10052 Comments: --------- The residual plots show a very strong outlier: observation 37, which is the first measurement for subject 5 with drug B. It is the highest in that series, contrasting all other series which peak after 1 hour. Also, the value is higher than all values for subject 5 with drug A. The P-value computed from the deletion residual of 5.26 is about 0.001. We decide to remove that observation and rerun the analysis. Before continuing, we note that the ANOVA table shows no effects whatsoever of drug or drug*time. There is some effect of drug*subject, indicating the importance of the drug*subject variation in the data. MTB > Copy 'y' c7; SUBC> Varnames. MTB > let c7(37)='*' MTB > Name c8 "SRES2" c9 "TRES2" MTB > GLM 'y_1' = drug|time subject subject*drug; SUBC> Random 'subject'; SUBC> Brief 2 ; SUBC> EMS; SUBC> SResiduals 'SRES2'; SUBC> TResiduals 'TRES2'; SUBC> GFourpack; SUBC> RType 2 . General Linear Model: y_1 versus drug, time, subject Factor Type Levels Values drug fixed 2 A, B time fixed 4 1, 2, 3, 6 subject random 5 1, 2, 3, 4, 5 Analysis of Variance for y_1, using Adjusted SS for Tests Source DF Seq SS Adj SS Adj MS F P drug 1 0.03096 0.00012 0.00012 0.00 0.989 x time 3 3.51160 3.99752 1.33251 27.98 0.000 drug*time 3 0.18907 0.35886 0.11962 2.51 0.084 subject 4 5.28793 5.36567 1.34142 2.15 0.238 drug*subject 4 2.49481 2.49481 0.62370 13.10 0.000 Error 23 1.09534 1.09534 0.04762 Total 38 12.60971 x Not an exact F-test. S = 0.218228 R-Sq = 91.31% R-Sq(adj) = 85.65% Unusual Observations for y_1 Obs y_1 Fit SE Fit Residual St Resid 13 0.62000 0.14125 0.14434 0.47875 2.93 R 29 0.32000 0.72875 0.14434 -0.40875 -2.50 R 30 2.12000 1.69908 0.13874 0.42092 2.50 R R denotes an observation with a large standardized residual. Expected Mean Squares, using Adjusted SS Source Expected Mean Square for Each Term 1 drug (6) + 3.8400 (5) + Q[1, 3] 2 time (6) + Q[2, 3] 3 drug*time (6) + Q[3] 4 subject (6) + 3.8571 (5) + 7.7143 (4) 5 drug*subject (6) + 3.8571 (5) 6 Error (6) Error Terms for Tests, using Adjusted SS Source Error DF Error MS Synthesis of Error MS 1 drug 4.00 0.62114 0.9956 (5) + 0.0044 (6) 2 time 23.00 0.04762 (6) 3 drug*time 23.00 0.04762 (6) 4 subject 4.00 0.62370 (5) 5 drug*subject 23.00 0.04762 (6) Variance Components, using Adjusted SS Estimated Source Value subject 0.09304 drug*subject 0.14935 Error 0.04762 MTB > NormTest 'SRES2'. Probability Plot of SRES2 The P-value for the W-test is 0.032. Comments: --------- The model without observation 37 has no longer any strong residuals, but the residual plots looks strange, possibly indicating a right-skewed distribution. Quite amazingly, the drug*time effect is now close to significance, at a P-value of 0.084. The effect lies in the comparison between drugs at 1 hour, where now - after the removal of obs. 37 - drug B lies lower than drug A. Obviously, this conclusion must be taken with some reservation, by the strong dependence on the removal of obs. 37. For this model, one may try alternative correlation structures for the repeated measures on the subjects (see SAS and Stata analyses). The results show that assuming equal correlation among all time points is actually a quite good assumption for these data. As the residuals were not yet fully satisfactory, we decide to look for a suitable transformation. A Box-Cox analysis in Stata based on the fixed effects version of the model gives optimal powers around 0.5 (0.61 for the full data, 0.57 without observation 37). We therefore try a square-root transformation. It turns out that for the square-root transformed outcome there are two strong outliers: observations 29 and 37. A third Box-Cox analysis without both of these observations gives an optimal power of 0.16, and no evidence against a log-transformation. To complete the analysis, we also try a log-transformation of the outcome, without both of these extreme outliers. MTB > Name C5 'lny' MTB > Let 'lny' = ln('y') MTB > let c5(29)='*' MTB > let c5(37)='*' MTB > Name c6 "TRES1" MTB > GLM 'lny' = drug| time subject subject*drug; SUBC> Random 'subject'; SUBC> Brief 2 ; SUBC> EMS; SUBC> Means time; SUBC> TResiduals 'TRES1'; SUBC> GFourpack; SUBC> RType 2 . General Linear Model: lny versus drug, time, subject Factor Type Levels Values drug fixed 2 A, B time fixed 4 1, 2, 3, 6 subject random 5 1, 2, 3, 4, 5 Analysis of Variance for lny, using Adjusted SS for Tests Source DF Seq SS Adj SS Adj MS F P drug 1 0.00561 0.00446 0.00446 0.01 0.926 x time 3 2.27575 2.38912 0.79637 47.45 0.000 drug*time 3 0.02591 0.04857 0.01619 0.96 0.427 subject 4 3.69145 3.62168 0.90542 1.93 0.270 drug*subject 4 1.87426 1.87426 0.46856 27.92 0.000 Error 22 0.36923 0.36923 0.01678 Total 37 8.24220 x Not an exact F-test. S = 0.129550 R-Sq = 95.52% R-Sq(adj) = 92.47% Unusual Observations for lny Obs lny Fit SE Fit Residual St Resid 13 -0.478036 -0.722643 0.091606 0.244607 2.67 R 21 -0.430783 -0.237701 0.091606 -0.193082 -2.11 R R denotes an observation with a large standardized residual. Expected Mean Squares, using Adjusted SS Source Expected Mean Square for Each Term 1 drug (6) + 3.6000 (5) + Q[1, 3] 2 time (6) + Q[2, 3] 3 drug*time (6) + Q[3] 4 subject (6) + 3.7143 (5) + 7.4286 (4) 5 drug*subject (6) + 3.7143 (5) 6 Error (6) Error Terms for Tests, using Adjusted SS Source Error DF Error MS Synthesis of Error MS 1 drug 4.01 0.45466 0.9692 (5) + 0.0308 (6) 2 time 22.00 0.01678 (6) 3 drug*time 22.00 0.01678 (6) 4 subject 4.00 0.46856 (5) 5 drug*subject 22.00 0.01678 (6) Variance Components, using Adjusted SS Estimated Source Value subject 0.05881 drug*subject 0.12163 Error 0.01678 Least Squares Means for lny time Mean 1 -0.1947 2 0.4026 3 0.1667 6 -0.1956 Comments: --------- After removal of two outlying observations the residuals look acceptable. The second removed observation was the first value for drug B of subject 4. This value was unexpectedly low, and by now having removed both one unexpectedly high and one unexpectedly low observation there is no longer indication of significance for the drug*time interaction. We conclude that there is no sign of difference between the drugs. With the non-significant drug effects we do not need to worry about the possible effects of violation of the assumptions in the model from repeated measures over time, because those would tend to increase our P-values even more (if the data do not show sphericity). Analyses in SAS and Stata confirm that our conclusions do not change when taking the repeated measures into account. Analysis at single time points does (not surprisingly) show absolutely no difference between drugs at any time. Addendum: --------- The above analysis did not include checks of the distribution of subject random effects. Averaging over subject and drug, and analyzing these means in a two-way ANOVA shows that there are no problems with the subject random effects, neither for the untransformed nor for the transformed data (results not shown).