Solution file for additional exercise 9.1 ------------------------------------------ Data: measurements of peak expiratory flow (PEF) of 13 children 8 hours after treatment with two different inhalation medicines. Notation: y_i = PEF for treatment episode i, i=1,...,26, or y_ijk = PEF for child j after treatment i in period j, i=F,S; j=1,...,13; k=1,2. (a) The design is a AB/BA cross-over design with 7 children for treatment sequence FS and 6 children for sequence SF. If no carry-over effect is assumed, the statistical model should include only treatment, period and subject effects: y_i = mu + alpha_tx(i) + beta_period(i) + gamma_child(i) + eps_i, or y_ijk = mu + alpha_i + beta_k + gamma_j + eps_ijk, depending on the chosen notation. The errors are assumed independent and distributed as a N(0,sigma^2) distribution. MTB > WOpen "H:\VHM\VHM802\Data_csv\hs09_1.csv"; SUBC> FType; SUBC> CSV; SUBC> DecSep; SUBC> Period; SUBC> Field; SUBC> Comma; SUBC> TDelimiter; SUBC> DoubleQuote. Retrieving worksheet from file: ‘H:\VHM\VHM802\Data_csv\hs09_1.csv’ Worksheet was saved on 08/03/2014 MTB > GLM; SUBC> Response 'pef'; SUBC> Nodefault; SUBC> Categorical 'id' 'period' 'tx'; SUBC> Terms id period tx; SUBC> Means period tx; SUBC> TExpand; SUBC> TMethod; SUBC> TAnova; SUBC> TSummary; SUBC> TCoefficients; SUBC> TEquation; SUBC> TFactor; SUBC> TMeans; SUBC> TDiagnostics 0; SUBC> Rtype 2; SUBC> GFOURPACK. General Linear Model: pef versus id, period, tx Method Factor coding (-1, 0, +1) Factor Information Factor Type Levels Values id Fixed 13 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 14 period Fixed 2 1, 2 tx Fixed 2 F, S Analysis of Variance Source DF Seq SS Contribution Adj SS Adj MS F-Value P-Value id 12 115213 83.19% 115213 9601.1 12.79 0.000 period 1 985 0.71% 1632 1632.1 2.17 0.168 tx 1 14036 10.14% 14036 14035.9 18.70 0.001 Error 11 8254 5.96% 8254 750.4 Total 25 138488 100.00% S R-sq R-sq(adj) PRESS R-sq(pred) 27.3935 94.04% 86.45% 46516.8 66.41% Coefficients Term Coef SE Coef 95% CI T-Value P-Value VIF Constant 318.46 5.37 (306.64, 330.29) 59.28 0.000 id 1 -28.5 18.6 ( -69.4, 12.5) -1.53 0.154 1.85 2 59.0 18.6 ( 18.1, 100.0) 3.17 0.009 1.85 3 36.5 18.6 ( -4.4, 77.5) 1.96 0.075 1.85 4 -33.5 18.6 ( -74.4, 7.5) -1.80 0.100 1.85 5 76.5 18.6 ( 35.6, 117.5) 4.11 0.002 1.85 6 16.5 18.6 ( -24.4, 57.5) 0.89 0.393 1.85 7 81.5 18.6 ( 40.6, 122.5) 4.38 0.001 1.85 9 -13.5 18.6 ( -54.4, 27.5) -0.72 0.485 1.85 10 -88.5 18.6 (-129.4, -47.5) -4.75 0.001 1.85 11 46.5 18.6 ( 5.6, 87.5) 2.50 0.029 1.85 12 -18.5 18.6 ( -59.4, 22.5) -0.99 0.343 1.85 13 -163.5 18.6 (-204.4, -122.5) -8.78 0.000 1.85 period 1 -7.95 5.39 (-19.81, 3.91) -1.47 0.168 1.01 tx F 23.30 5.39 ( 11.44, 35.16) 4.32 0.001 1.01 Regression Equation pef = 318.46 - 28.5 id_1 + 59.0 id_2 + 36.5 id_3 - 33.5 id_4 + 76.5 id_5 + 16.5 id_6 + 81.5 id_7 - 13.5 id_9 - 88.5 id_10 + 46.5 id_11 - 18.5 id_12 - 163.5 id_13 + 29.0 id_14 - 7.95 period_1 + 7.95 period_2 + 23.30 tx_F - 23.30 tx_S Means Fitted Term Mean SE Mean period 1 310.52 7.61 2 326.41 7.61 tx F 341.77 7.61 S 295.16 7.61 Residual Plots for pef MTB > GLM; SUBC> Response 'pef'; SUBC> Nodefault; SUBC> Categorical 'id' 'period' 'tx'; SUBC> Terms id period tx; SUBC> Boxcox; SUBC> TExpand; SUBC> TMethod; SUBC> TAnova; SUBC> TSummary; SUBC> TCoefficients; SUBC> TEquation; SUBC> TFactor; SUBC> TDiagnostics 0; SUBC> Rtype 2. General Linear Model: pef versus id, period, tx Box-Cox transformation Rounded ? 2 Estimated ? 1.81928 95% CI for ? (1.19678, 2.69378) ... Comments: --------- The normal plot looks fine, but the residual plot shows some tendency of 'inverse cone'-shape (decreasing variance with increasing values). This is essentially due to one pair of large residuals (note that all residuals come in pairs with positive and negative signs but the same value) at the lower end of the predicted values. No residuals are suspiciously large. Box-Cox analysis in Minitab or Stata gives an optimal power of 1.82 and confidence intervals that do not include 1, and thus provides evidence in favour of a transformation. However, the tests of homoscedasticity (in Stata) are non-significant. Because power transformations with powers >1 are somewhat unusual and its justification in the present case is not intuitively obvious, we decide to work with the untransformed data. The ANOVA table shows a clear subject effect, no period effect and a clearly significant treatment effect (P<0.0005). Note that sequential and adjusted sum of squares are not the same, because the design is not balanced in the sequences (due to child no. 8 dropping out). The (least squares) means show that the new drug, formoterol, on the average increases the pef by 46 units. The Minitab commands to analyse the period differences 1-2 are given below. MTB > Unstack ('id' 'order' 'pef'); SUBC> Subscripts 'period'; SUBC> After; SUBC> VarNames. MTB > Name C12 'diffper' MTB > Let 'diffper' = 'pef_1'-'pef_2' MTB > TwoT 'diffper' 'order_1'; SUBC> Confidence 95.0; SUBC> Test 0.0; SUBC> Alternative 0; SUBC> Pooled. Two-Sample T-Test and CI: diffper, order_1 Two-sample T for diffper SE order_1 N Mean StDev Mean FS 7 30.7 33.0 12 SF 6 -62.5 44.7 18 Difference = µ (FS) - µ (SF) Estimate for difference: 93.2 95% CI for difference: (45.8, 140.7) T-Test of difference = 0 (vs ?): T-Value = 4.32 P-Value = 0.001 DF = 11 Both use Pooled StDev = 38.7403 Comments: --------- The two-sample t-test with equal variances does indeed reproduce the strong significance from above; note that t^2=4.32^2=18.7=F. (b) The model with subject random effects can be run in Minitab as long as the interaction period*tx is not included (it will simply be dropped if we try to include it). MTB > GLM; SUBC> Response 'pef'; SUBC> Nodefault; SUBC> Categorical 'id' 'period' 'tx'; SUBC> Random id; SUBC> Terms id period tx; SUBC> Means period tx; SUBC> TExpand; SUBC> TMethod; SUBC> TAnova; SUBC> TSummary; SUBC> TCoefficients; SUBC> TEquation; SUBC> TFactor; SUBC> TEMS; SUBC> TVariance; SUBC> TMeans; SUBC> TDiagnostics 0; SUBC> Rtype 2. General Linear Model: pef versus id, period, tx Factor Information Factor Type Levels Values id Random 13 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 14 period Fixed 2 1, 2 tx Fixed 2 F, S Analysis of Variance Source DF Seq SS Contribution Adj SS Adj MS F-Value P-Value id 12 115213 83.19% 115213 9601.1 12.79 0.000 period 1 985 0.71% 1632 1632.1 2.17 0.168 tx 1 14036 10.14% 14036 14035.9 18.70 0.001 Error 11 8254 5.96% 8254 750.4 Total 25 138488 100.00% S R-sq R-sq(adj) PRESS R-sq(pred) 27.3935 94.04% 86.45% 46516.8 66.41% Coefficients Term Coef SE Coef 95% CI T-Value P-Value VIF Constant 318.46 5.37 (306.64, 330.29) 59.28 0.000 id 1 -28.5 18.6 ( -69.4, 12.5) -1.53 0.154 * 2 59.0 18.6 ( 18.1, 100.0) 3.17 0.009 * 3 36.5 18.6 ( -4.4, 77.5) 1.96 0.075 * 4 -33.5 18.6 ( -74.4, 7.5) -1.80 0.100 * 5 76.5 18.6 ( 35.6, 117.5) 4.11 0.002 * 6 16.5 18.6 ( -24.4, 57.5) 0.89 0.393 * 7 81.5 18.6 ( 40.6, 122.5) 4.38 0.001 * 9 -13.5 18.6 ( -54.4, 27.5) -0.72 0.485 * 10 -88.5 18.6 (-129.4, -47.5) -4.75 0.001 * 11 46.5 18.6 ( 5.6, 87.5) 2.50 0.029 * 12 -18.5 18.6 ( -59.4, 22.5) -0.99 0.343 * 13 -163.5 18.6 (-204.4, -122.5) -8.78 0.000 * period 1 -7.95 5.39 (-19.81, 3.91) -1.47 0.168 1.01 tx F 23.30 5.39 ( 11.44, 35.16) 4.32 0.001 1.01 Regression Equation pef = 318.46 - 28.5 id_1 + 59.0 id_2 + 36.5 id_3 - 33.5 id_4 + 76.5 id_5 + 16.5 id_6 + 81.5 id_7 - 13.5 id_9 - 88.5 id_10 + 46.5 id_11 - 18.5 id_12 - 163.5 id_13 + 29.0 id_14 - 7.95 period_1 + 7.95 period_2 + 23.30 tx_F - 23.30 tx_S Equation treats random terms as though they are fixed. Expected Mean Squares, using Adjusted SS Expected Mean Square Source for Each Term 1 id (4) + 2.0000 (1) 2 period (4) + Q[2] 3 tx (4) + Q[3] 4 Error (4) ... Means Term Fitted Mean period 1 310.515 2 326.408 tx F 341.765 S 295.158 Variance Components, using Adjusted SS Source Variance % of Total StDev % of Total id 4425.36 85.50% 66.5234 92.47% Error 750.406 14.50% 27.3935 38.08% Total 5175.76 71.9428 Comments: --------- The ANOVA table is unchanged from the fixed effects model. The estimated variance component for subjects is very large: 4425 compared to 750 for the within-subject variation, corresponding to 85% of the unexplained variation. This shows that the cross-over design is much more efficient than an ordinary completely randomised design would have been. Stata commands and output for part (b): . mixed pef i.Tx i.period || id:, reml Performing EM optimization: Performing gradient-based optimization: Iteration 0: log restricted-likelihood = -127.56466 Iteration 1: log restricted-likelihood = -127.56466 Computing standard errors: Mixed-effects REML regression Number of obs = 26 Group variable: id Number of groups = 13 Obs per group: min = 2 avg = 2.0 max = 2 Wald chi2(2) = 20.02 Log restricted-likelihood = -127.56466 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ pef | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- Tx | S | -46.60714 10.77656 -4.32 0.000 -67.72881 -25.48548 2.period | 15.89286 10.77656 1.47 0.140 -5.228807 37.01452 _cons | 333.8187 20.56391 16.23 0.000 293.5142 374.1232 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+------------------------------------------------ id: Identity | var(_cons) | 4425.36 1966.341 1852.38 10572.24 -----------------------------+------------------------------------------------ var(Residual) | 750.4055 319.9739 325.3438 1730.81 ------------------------------------------------------------------------------ LR test vs. linear regression: chibar2(01) = 14.67 Prob >= chibar2 = 0.0001 . * added analysis with tx by period interaction . mixed pef i.Tx##i.period || id:, reml Performing EM optimization: Performing gradient-based optimization: Iteration 0: log restricted-likelihood = -122.26323 Iteration 1: log restricted-likelihood = -122.26323 Computing standard errors: Mixed-effects REML regression Number of obs = 26 Group variable: id Number of groups = 13 Obs per group: min = 2 avg = 2.0 max = 2 Wald chi2(3) = 20.05 Log restricted-likelihood = -122.26323 Prob > chi2 = 0.0002 ------------------------------------------------------------------------------ pef | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- Tx | S | -53.80952 41.62196 -1.29 0.196 -135.3871 27.76802 2.period | 8.690476 41.62196 0.21 0.835 -72.88707 90.26802 | Tx#period | S#2 | 14.40476 80.40531 0.18 0.858 -143.1867 171.9963 | _cons | 337.1429 28.27655 11.92 0.000 281.7218 392.5639 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+------------------------------------------------ id: Identity | var(_cons) | 4846.539 2232.299 1965.039 11953.42 -----------------------------+------------------------------------------------ var(Residual) | 750.4055 319.9739 325.3438 1730.81 ------------------------------------------------------------------------------ LR test vs. linear regression: chibar2(01) = 15.24 Prob >= chibar2 = 0.0000 Comments: --------- The random effects model without the interaction gives similar but not identical results to the ANOVA-based analysis. The variance components are identical (as is true for all "nice" designs, when the likelihood-based analysis uses REML estimation). The tests for the fixed effects give slightly more significant P-values, due to the use of a standard normal reference distribution instead of the (exact) t-distribution. The treatment by period interaction shows absolutely no significance. Note however the changed results for the two main effects; these are no longer useful (the model should be refitted without the interaction before assessing the main effects). A treatment by period interaction would mean that the difference between treatments depends on the period. Such an effect would most likely be a carry-over effect (although there could also be genuinely different treatment effects in periods). For example, treatment A differs in periods 1 and 2 by being either the first treatment applied (AB) and by being the follow-up treatment after treatment B was applied (BA). In the simple AB/BA-design we cannot from the data alone distinguish between carry-over effects and a genuine treatment by period interaction. With a totally non-significant interaction, we conclude that no carry-over effects seem to be present in these data.