Solution file for additional exercise 10.7 ------------------------------------------ Data on excretion of salsolinol measured over 4 consecutive days, for alcoholized persons, divided into two groups based on degree of addiction. Group 1 contains 6 persons, and group 2 contains 8 persons. Note that the datafile codes persons as 1-6 in group 1 and 1-8 in group 2; you might want to create unique person id's although that is not strictly required for analysis in Minitab. These are repeated measures or longitudinal data because measurements are taken on the same person (experimental unit), and in this case over time. Notation: y_ijk = salsolinol value for patient j in patient group i measured at day k, i = 1,2 (group), j = 1,..., n_i (patient; n_1=6, n_2=8), k = 1,2,3,4 (day), The first step is to construct profile plots over time for each person. MTB > WOpen "h:\vhm\vhm802\data_csv\hs10_7.csv"; SUBC> FType; SUBC> CSV; SUBC> DecSep; SUBC> Period; SUBC> Field; SUBC> Comma; SUBC> TDelimiter; SUBC> DoubleQuote. Retrieving worksheet from file: 'h:\vhm\vhm802\data_csv\hs10_7.csv' Worksheet was saved on 03/03/2011 MTB > Plot 'salsolinol'*'day'; SUBC> Symbol 'person'; SUBC> Connect 'person'; SUBC> Panel 'group'. Scatterplot of salsolinol vs day Comments: --------- The plot seems to indicate more variability for higher values of the outcome than for lower values. To confirm this first impression, we fit a split-plot model with persons as replicates within groups, and look at the residuals. MTB > GLM 'salsolinol' = group| day person(group); SUBC> Random 'person'; SUBC> Brief 2 ; SUBC> GFourpack; SUBC> RType 2 . General Linear Model: salsolinol versus group, day, person Factor Type Levels Values group fixed 2 1, 2 day fixed 4 1, 2, 3, 4 person(group) random 14 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 7, 8 Analysis of Variance for salsolinol, using Adjusted SS for Tests Source DF Seq SS Adj SS Adj MS F P group 1 6.565 6.565 6.565 1.95 0.188 day 3 14.888 11.723 3.908 0.85 0.476 group*day 3 8.029 8.029 2.676 0.58 0.631 person(group) 12 40.329 40.329 3.361 0.73 0.713 Error 36 165.692 165.692 4.603 Total 55 235.503 S = 2.14536 R-Sq = 29.64% R-Sq(adj) = 0.00% Unusual Observations for salsolinol Obs salsolinol Fit SE Fit Residual St Resid 22 6.34000 2.82417 1.31376 3.51583 2.07 R 45 7.80000 2.76438 1.25783 5.03563 2.90 R 56 8.10000 3.76812 1.25783 4.33188 2.49 R R denotes an observation with a large standardized residual. Residual Plots for salsolinol Comments: --------- The plot of residuals vs fitted values shows a clear cone or fan shape (increasing variation with increasing fitted values). Also, the residuals extend further the positive than the negative side, and the histogram indicates a right-skewed distribution. We try a log-transformation (below), and the residuals look much better. It may of interest to explore which transformations are best for these data, but we will here work only with the logs. The boxcox command in Stata and the General Regression menu (below) are only for fixed effects models, but may nevertheless give an indication of a suitable transformation. MTB > Name C5 'id' MTB > Let 'id' = group*10+person MTB > GReg 'salsolinol' = id day|group; SUBC> Categorical 'group' 'day' 'id'; SUBC> Constant; SUBC> Confidence 95.0; SUBC> Coding -1; SUBC> Boxcox; SUBC> TEquation; SUBC> TCoef; SUBC> TSummary; SUBC> TANOVA; SUBC> TDiag 0. General Regression Analysis: salsolinol versus group, day, id * NOTE * group cannot be estimated and has been removed. Box-Cox transformation of the response with rounded lambda = 0 The 95% CI for lambda is (-0.465, 0.155) Coefficients Term Coef SE Coef T P Constant 0.387338 0.124196 3.11875 0.004 id 11 -0.251417 0.447797 -0.56145 0.578 12 0.061027 0.447797 0.13628 0.892 13 0.058039 0.447797 0.12961 0.898 14 -0.440241 0.447797 -0.98313 0.332 15 -0.570161 0.447797 -1.27326 0.211 16 0.003744 0.447797 0.00836 0.993 21 -0.503960 0.447797 -1.12542 0.268 22 0.246892 0.447797 0.55135 0.585 23 0.800833 0.447797 1.78839 0.082 24 0.075232 0.447797 0.16801 0.868 25 -0.261206 0.447797 -0.58331 0.563 26 0.557598 0.447797 1.24520 0.221 27 0.466523 0.447797 1.04182 0.304 day 1 -0.360640 0.217344 -1.65931 0.106 2 -0.190104 0.217344 -0.87467 0.388 3 0.175521 0.217344 0.80757 0.425 day*group 1 1 0.053527 0.217344 0.24628 0.807 2 1 0.095315 0.217344 0.43855 0.664 3 1 -0.017394 0.217344 -0.08003 0.937 Summary of Model S = 0.929401 R-Sq = 31.54% R-Sq(adj) = -4.58% PRESS = 76.1600 R-Sq(pred) = -67.66% Analysis of Variance Source DF Seq SS Adj SS Adj MS F P Regression 19 14.3295 14.3295 0.75419 0.87312 0.614540 id 13 8.8277 8.8277 0.67906 0.78614 0.669221 day 3 5.0968 4.6327 1.54423 1.78774 0.166936 day*group 3 0.4050 0.4050 0.13500 0.15629 0.924949 Error 36 31.0963 31.0963 0.86379 Total 55 45.4258 Fits and Diagnostics for Unusual Observations for Transformed Response Obs ln(salsolinol) Fit SE Fit Residual St Resid 5 1.66771 0.141252 0.569139 1.52645 2.07750 R 22 1.84688 0.296293 0.569139 1.55059 2.11034 R 45 2.05412 0.530769 0.544910 1.52335 2.02331 R Fits for Unusual Observations for Original Response Obs salsolinol Fit 5 5.30 1.15171 R 22 6.34 1.34486 R 45 7.80 1.70024 R R denotes an observation with a large standardized residual. Comments: --------- In order to run the model in the General Regression menu, a variable holding unique id's of persons must be created. Furthermore, the id factor must precede the day|group term; this means that the main effect for group cannot be estimated (because it's contained in the id effect), but the model is still the same. The analysis supports a log transformation. The boxcox analysis in Stata gives an optimal theta=-0.16. MTB > Name C6 'lnsals' MTB > Let 'lnsals' = loge('salsolinol') MTB > Plot 'lnsals'*'day'; SUBC> Symbol 'person'; SUBC> Connect 'person'; SUBC> Panel 'group'. Scatterplot of lnsals vs day Comments: --------- The profiles now appear equally noisy across the range of lnsals values. There are no obvious person effects, and nor are average effects of group or day clearly seen. The results for the split-plot model are shown below; the model checks look much better now. As our first analysis, we analyse the data separately for each day. This requires the data structure to be changed from long to wide. MTB > Unstack ('lnsals' 'group'); SUBC> Subscripts 'day'; SUBC> NewWS; SUBC> VarNames. Results for: Worksheet 2 MTB > Oneway 'lnsals_1' 'group_1'. One-way ANOVA: lnsals_1 versus group_1 Source DF SS MS F P group_1 1 0.17 0.17 0.15 0.708 Error 12 14.22 1.18 Total 13 14.39 S = 1.089 R-Sq = 1.21% R-Sq(adj) = 0.00% Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev --------+---------+---------+---------+- 1 6 -0.110 1.186 (---------------*---------------) 2 8 0.116 1.013 (-------------*-------------) --------+---------+---------+---------+- -0.60 0.00 0.60 1.20 Pooled StDev = 1.089 MTB > Oneway 'lnsals_2' 'group_1'. One-way ANOVA: lnsals_2 versus group_1 Source DF SS MS F P group_1 1 0.069 0.069 0.09 0.765 Error 12 8.792 0.733 Total 13 8.860 S = 0.8559 R-Sq = 0.78% R-Sq(adj) = 0.00% Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev ------+---------+---------+---------+--- 1 6 0.1027 1.0462 (------------------*------------------) 2 8 0.2443 0.6886 (---------------*----------------) ------+---------+---------+---------+--- -0.40 0.00 0.40 0.80 Pooled StDev = 0.8559 MTB > Oneway 'lnsals_3' 'group_1'. One-way ANOVA: lnsals_3 versus group_1 Source DF SS MS F P group_1 1 0.462 0.462 0.76 0.399 Error 12 7.259 0.605 Total 13 7.721 S = 0.7778 R-Sq = 5.98% R-Sq(adj) = 0.00% Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev -------+---------+---------+---------+-- 1 6 0.3556 0.7052 (-------------*-------------) 2 8 0.7226 0.8257 (-----------*-----------) -------+---------+---------+---------+-- 0.00 0.50 1.00 1.50 Pooled StDev = 0.7778 MTB > Oneway 'lnsals_4' 'group_1'. One-way ANOVA: lnsals_4 versus group_1 Source DF SS MS F P group_1 1 1.214 1.214 1.79 0.206 Error 12 8.140 0.678 Total 13 9.354 S = 0.8236 R-Sq = 12.98% R-Sq(adj) = 5.73% Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev ------+---------+---------+---------+--- 1 6 0.4413 0.7889 (--------------*-------------) 2 8 1.0364 0.8475 (------------*-----------) ------+---------+---------+---------+--- 0.00 0.50 1.00 1.50 Pooled StDev = 0.8236 Comments: --------- The 4 separate analyses for each day show no significant differences between the two groups. Only the last one (day 4) gets even close to something interesting. There is no sign of problems with heterogeneous variances. Instead of the one-way ANOVA, we could also have used 2-sample t-tests. The profile plot (above) does not suggest any obvious response features. We therefore drop this part here. The split-plot model may be appropriate for the relatively short series of 4 measures per person. It takes * persons = whole plots, * groups = whole plot factor, * dayly measurements = sub-plots, * day = sub-plot factor, * no blocks - the persons are replications within groups. Another way of say this is that the data may be viewed as having hierarchical structure with measurements within persons. The statistical model is y_ijk = mu + alpha_i + (AB)_ij + gamma_k + (alpha gamma)_ik + eps_ijk, where AB_ij's are assumed i.i.d. N(0,sigma_AB^2), and where eps_ijk's are assumed i.i.d. N(0,sigma^2), Results for: hs10_7.csv MTB > Name c7 "SRES1" c7 "TRES1" MTB > GLM 'lnsals' = group| day person(group); SUBC> Random 'person'; SUBC> Brief 2 ; SUBC> EMS; SUBC> Means group| day; SUBC> SResiduals 'SRES1'; SUBC> TResiduals 'TRES1'; SUBC> GFourpack; SUBC> RType 2 . General Linear Model: lnsals versus group, day, person Factor Type Levels Values group fixed 2 1, 2 day fixed 4 1, 2, 3, 4 person(group) random 14 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 7, 8 Analysis of Variance for lnsals, using Adjusted SS for Tests Source DF Seq SS Adj SS Adj MS F P group 1 1.5136 1.5136 1.5136 2.48 0.141 day 3 5.0968 4.6327 1.5442 1.79 0.167 group*day 3 0.4050 0.4050 0.1350 0.16 0.925 person(group) 12 7.3142 7.3142 0.6095 0.71 0.736 Error 36 31.0963 31.0963 0.8638 Total 55 45.4258 S = 0.929401 R-Sq = 31.54% R-Sq(adj) = 0.00% Unusual Observations for lnsals Obs lnsals Fit SE Fit Residual St Resid 5 1.66771 0.14125 0.56914 1.52645 2.08 R 22 1.84688 0.29629 0.56914 1.55059 2.11 R 45 2.05412 0.53077 0.54491 1.52335 2.02 R R denotes an observation with a large standardized residual. Expected Mean Squares, using Adjusted SS Expected Mean Square for Source Each Term 1 group (5) + 4.0000 (4) + Q[1, 3] 2 day (5) + Q[2, 3] 3 group*day (5) + Q[3] 4 person(group) (5) + 4.0000 (4) 5 Error (5) ... Variance Components, using Adjusted SS Estimated Source Value person(group) -0.06357 Error 0.86379 Least Squares Means for lnsals group Mean 1 0.1975 2 0.5297 day 1 0.0030 2 0.1735 3 0.5391 4 0.7388 group*day 1 1 -0.1096 1 2 0.1027 1 3 0.3556 1 4 0.4413 2 1 0.1155 2 2 0.2443 2 3 0.7226 2 4 1.0364 Residual Plots for lnsals MTB > GLM 'lnsals' = group| day person(group); SUBC> Random 'person'; SUBC> SMeans C4000; SUBC> Brief 0; SUBC> Interact 'group' 'day'. MTB > GFInt 'group' 'day'; SUBC> Responses 'lnsals'; SUBC> FMeans C4000. Interaction Plot (fitted means) for lnsals MTB > Erase C4000. MTB > NormTest 'SRES1'. Probability Plot of SRES1 The P-value for the Anderson-Darling test of normality is 0.211. MTB > Name c9 "ByVar1" c10 "ByVar2" c11 "Mean1" MTB > Statistics 'lnsals'; SUBC> By 'group' 'person'; SUBC> NoEmpty; SUBC> GValues 'ByVar1'-'ByVar2'; SUBC> Mean 'Mean1'. MTB > Name c11 "RESI1" MTB > Oneway 'Mean1' 'ByVar1'; SUBC> Residuals 'RESI1'; SUBC> GFourpack. One-way ANOVA: Mean1 versus ByVar1 Source DF SS MS F P ByVar1 1 0.378 0.378 2.48 0.141 Error 12 1.829 0.152 Total 13 2.207 S = 0.3904 R-Sq = 17.15% R-Sq(adj) = 10.24% Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev ------+---------+---------+---------+--- 1 6 0.1975 0.2731 (-------------*-------------) 2 8 0.5297 0.4560 (-----------*-----------) ------+---------+---------+---------+--- 0.00 0.25 0.50 0.75 Pooled StDev = 0.3904 Residual Plots for Mean1 MTB > NormTest 'RESI1'. Probability Plot of RESI1 The P-value for the Anderson-Darling test of normality is 0.661. Comments: --------- The residuals look nice both when plotted against the fitted values and in the normal plot, and there are no alarmingly extreme values. This is true both for the sub-plot and the whole plot residuals, even if the latter seem to have somewhat more variation in group 2 than in group 1. The ANOVA table shows most effects to be insignificant. The whole-plot variation seems to be very small (negative estimated variance component which should therefore be set to 0), but we keep the whole-plots in the model anyway. The group*day interaction seems to be very small - presumably a disappointment for the researchers, because it would seem to be the most interesting effect, depending though on the interpretation of days (days after some event?). There are weak but clearly insignificant effects of groups and days. The estimated means show group 2 to have higher values, and the values to be increasing with days. The interaction plot shows a roughly linear relationship with days, with almost parallel curves. To study this further, we fit a model with a linear day effect MTB > GLM 'lnsals' = group| day person(group); SUBC> Covariates 'day'; SUBC> Random 'person'; SUBC> Brief 2 ; SUBC> EMS. General Linear Model: lnsals versus group, person Factor Type Levels Values group fixed 2 1, 2 person(group) random 14 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 7, 8 Analysis of Variance for lnsals, using Adjusted SS for Tests Source DF Seq SS Adj SS Adj MS F P group 1 1.5136 0.0000 0.0000 0.00 0.998 x day 1 4.9850 4.5404 4.5404 5.80 0.021 group*day 1 0.3056 0.3056 0.3056 0.39 0.536 person(group) 12 7.3142 7.3142 0.6095 0.78 0.668 Error 40 31.3075 31.3075 0.7827 Total 55 45.4258 x Not an exact F-test. S = 0.884697 R-Sq = 31.08% R-Sq(adj) = 5.23% Term Coef SE Coef T P Constant -0.2797 0.2926 -0.96 0.345 day 0.2573 0.1068 2.41 0.021 day*group 1 -0.0668 0.1068 -0.62 0.536 Unusual Observations for lnsals Obs lnsals Fit SE Fit Residual St Resid 5 1.66771 0.16253 0.50435 1.50518 2.07 R 22 1.84688 0.29580 0.44966 1.55108 2.04 R 45 2.05412 0.45881 0.48959 1.59531 2.16 R R denotes an observation with a large standardized residual. Expected Mean Squares, using Adjusted SS Expected Mean Square Source for Each Term 1 group (5) + 0.6667 (4) + Q[1] 2 day (5) + Q[2, 3] 3 group*day (5) + Q[3] 4 person(group) (5) + 4.0000 (4) 5 Error (5) ... Variance Components, using Adjusted SS Estimated Source Value person(group) -0.04329 Error 0.78269 Comment: -------- In random effects models, two nested models cannot be compared as easily using an F-test as for usual linear models. If a statistical test is desired, one may use the more advanced theory for tests based on the likelihood function (mixed command in Stata or proc Mixed in SAS). In this case, however, it is quite clear that the linear day effect is very appropriate. It explains 4.985/5.097=98% of the variation between days, and most of the (very small) variation in day*group. There is no need of non-parallel slopes here, so we remove them from the model. MTB > GLM 'lnsals' = group day person(group); SUBC> Covariates 'day'; SUBC> Random 'person'; SUBC> Brief 2 ; SUBC> EMS; SUBC> Means group. General Linear Model: lnsals versus group, person Factor Type Levels Values group fixed 2 1, 2 person(group) random 14 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 7, 8 Analysis of Variance for lnsals, using Adjusted SS for Tests Source DF Seq SS Adj SS Adj MS F P group 1 1.5136 1.5136 1.5136 2.48 0.141 day 1 4.9850 4.9850 4.9850 6.47 0.015 person(group) 12 7.3142 7.3142 0.6095 0.79 0.657 Error 41 31.6132 31.6132 0.7711 Total 55 45.4258 S = 0.878096 R-Sq = 30.41% R-Sq(adj) = 6.64% Term Coef SE Coef T P Constant -0.3035 0.2879 -1.05 0.298 day 0.2669 0.1050 2.54 0.015 Unusual Observations for lnsals Obs lnsals Fit SE Fit Residual St Resid 5 1.66771 0.04808 0.46642 1.61963 2.18 R 22 1.84688 0.25765 0.44217 1.58923 2.09 R 45 2.05412 0.54465 0.46642 1.50948 2.03 R 56 2.09186 0.54472 0.46642 1.54714 2.08 R R denotes an observation with a large standardized residual. Expected Mean Squares, using Adjusted SS Expected Mean Square Source for Each Term 1 group (4) + 4.0000 (3) + Q[1] 2 day (4) + Q[2] 3 person(group) (4) + 4.0000 (3) 4 Error (4) ... Variance Components, using Adjusted SS Estimated Source Value person(group) -0.04039 Error 0.77105 Means for Covariates Covariate Mean StDev day 2.500 1.128 Least Squares Means for lnsals group Mean 1 0.1975 2 0.5297 Comments: --------- The resulting model has a significant regression coefficient for days: the ln-salsolinol levels increase by 0.27 units per day, or equivalently it can be said that the salsolinol levels increase by a factor of exp(0.27)=1.3 per day. There is perhaps an indication of higher levels in group 2 than in group 1, but it is non-significant (P=0.14). One assumption of the split-plot analysis that cannot easily be checked in Minitab is that the correlation within persons is independent of time. As the person-variability is estimated as very low, it means that observations on the same person are essentially independent. A repeated measures ANOVA or a full mixed model analysis is required to assess the assumption, see the SAS-program hs10_7.sas or the Stata do-file hs10_7.do.