Solution file for additional exercise 9.1
------------------------------------------

Data: measurements of peak expiratory flow (PEF) of 13 children 8 hours after
treatment with two different inhalation medicines.
Notation:
  y_i = PEF for treatment episode i, i=1,...,26,
or
  y_ijk = PEF for child j after treatment i in period j,
  i=F,S; j=1,...,13; k=1,2.
  
(a)
The design is a AB/BA cross-over design with 7 children for treatment sequence
FS and 6 children for sequence SF. If no carry-over effect is assumed, the
statistical model should include only treatment, period and subject effects:
    y_i = mu + alpha_tx(i) + beta_period(i) + gamma_child(i) + eps_i,
or
    y_ijk = mu + alpha_i + beta_k + gamma_j + eps_ijk,
depending on the chosen notation. The errors are assumed independent and
distributed as a N(0,sigma^2) distribution.

MTB > WOpen "H:\VHM\VHM802\Data_csv\hs09_1.csv";
SUBC>   FType;
SUBC>     CSV;
SUBC>   DecSep;
SUBC>     Period;
SUBC>   Field;
SUBC>     Comma;
SUBC>   TDelimiter;
SUBC>     DoubleQuote.
Retrieving worksheet from file: ‘H:\VHM\VHM802\Data_csv\hs09_1.csv’
Worksheet was saved on 08/03/2014

MTB > GLM;
SUBC>   Response 'pef';
SUBC>   Nodefault;
SUBC>   Categorical 'id' 'period' 'tx';
SUBC>   Terms id period tx;
SUBC>   Means period tx;
SUBC>   TExpand;
SUBC>   TMethod;
SUBC>   TAnova;
SUBC>   TSummary;
SUBC>   TCoefficients;
SUBC>   TEquation;
SUBC>   TFactor;
SUBC>   TMeans;
SUBC>   TDiagnostics 0;
SUBC>   Rtype 2;
SUBC>  GFOURPACK.
General Linear Model: pef versus id, period, tx 

Method
Factor coding  (-1, 0, +1)

Factor Information
Factor  Type   Levels  Values
id      Fixed      13  1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 14
period  Fixed       2  1, 2
tx      Fixed       2  F, S

Analysis of Variance
Source    DF  Seq SS  Contribution  Adj SS   Adj MS  F-Value  P-Value
  id      12  115213        83.19%  115213   9601.1    12.79    0.000
  period   1     985         0.71%    1632   1632.1     2.17    0.168
  tx       1   14036        10.14%   14036  14035.9    18.70    0.001
Error     11    8254         5.96%    8254    750.4
Total     25  138488       100.00%

      S    R-sq  R-sq(adj)    PRESS  R-sq(pred)
27.3935  94.04%     86.45%  46516.8      66.41%

Coefficients
Term        Coef  SE Coef       95% CI       T-Value  P-Value   VIF
Constant  318.46     5.37  (306.64, 330.29)    59.28    0.000
id
  1        -28.5     18.6  ( -69.4,   12.5)    -1.53    0.154  1.85
  2         59.0     18.6  (  18.1,  100.0)     3.17    0.009  1.85
  3         36.5     18.6  (  -4.4,   77.5)     1.96    0.075  1.85
  4        -33.5     18.6  ( -74.4,    7.5)    -1.80    0.100  1.85
  5         76.5     18.6  (  35.6,  117.5)     4.11    0.002  1.85
  6         16.5     18.6  ( -24.4,   57.5)     0.89    0.393  1.85
  7         81.5     18.6  (  40.6,  122.5)     4.38    0.001  1.85
  9        -13.5     18.6  ( -54.4,   27.5)    -0.72    0.485  1.85
  10       -88.5     18.6  (-129.4,  -47.5)    -4.75    0.001  1.85
  11        46.5     18.6  (   5.6,   87.5)     2.50    0.029  1.85
  12       -18.5     18.6  ( -59.4,   22.5)    -0.99    0.343  1.85
  13      -163.5     18.6  (-204.4, -122.5)    -8.78    0.000  1.85
period
  1        -7.95     5.39  (-19.81,   3.91)    -1.47    0.168  1.01
tx
  F        23.30     5.39  ( 11.44,  35.16)     4.32    0.001  1.01

Regression Equation
pef = 318.46 - 28.5 id_1 + 59.0 id_2 + 36.5 id_3 - 33.5 id_4 + 76.5 id_5 + 16.5 id_6 + 81.5 id_7
      - 13.5 id_9 - 88.5 id_10 + 46.5 id_11 - 18.5 id_12 - 163.5 id_13 + 29.0 id_14 - 7.95 period_1
      + 7.95 period_2 + 23.30 tx_F - 23.30 tx_S

Means
        Fitted
Term      Mean  SE Mean
period
  1     310.52     7.61
  2     326.41     7.61
tx
  F     341.77     7.61
  S     295.16     7.61
 
Residual Plots for pef 

MTB > GLM;
SUBC>   Response 'pef';
SUBC>   Nodefault;
SUBC>   Categorical 'id' 'period' 'tx';
SUBC>   Terms id period tx;
SUBC>   Boxcox;
SUBC>   TExpand;
SUBC>   TMethod;
SUBC>   TAnova;
SUBC>   TSummary;
SUBC>   TCoefficients;
SUBC>   TEquation;
SUBC>   TFactor;
SUBC>   TDiagnostics 0;
SUBC>   Rtype 2.
General Linear Model: pef versus id, period, tx 

Box-Cox transformation
Rounded ?               2
Estimated ?             1.81928
95% CI for ?            (1.19678, 2.69378)
...


Comments:
---------
The normal plot looks fine, but the residual plot shows some tendency of
'inverse cone'-shape (decreasing variance with increasing values). This
is essentially due to one pair of large residuals (note that all residuals come
in pairs with positive and negative signs but the same value) at the lower
end of the predicted values. No residuals are suspiciously large. Box-Cox
analysis in Minitab or Stata gives an optimal power of 1.82 and confidence intervals
that do not include 1, and thus provides evidence in favour of a transformation. 
However, the tests of homoscedasticity (in Stata) are non-significant. Because 
power transformations with powers >1 are somewhat unusual and its justification 
in the present case is not intuitively obvious, we decide to work with the untransformed data.

The ANOVA table shows a clear subject effect, no period effect and a
clearly significant treatment effect (P<0.0005). Note that sequential and adjusted
sum of squares are not the same, because the design is not balanced in the sequences
(due to child no. 8 dropping out). The (least squares) means show that the new drug, 
formoterol, on the average increases the pef by 46 units.

The Minitab commands to analyse the period differences 1-2 are given below.

MTB > Unstack ('id' 'order' 'pef');
SUBC>   Subscripts 'period';
SUBC>   After;
SUBC>   VarNames.

MTB > Name C12 'diffper'
MTB > Let 'diffper' = 'pef_1'-'pef_2'
MTB > TwoT 'diffper' 'order_1';
SUBC>   Confidence 95.0;
SUBC>   Test 0.0;
SUBC>   Alternative 0;
SUBC>   Pooled.
Two-Sample T-Test and CI: diffper, order_1 

Two-sample T for diffper
                            SE
order_1  N   Mean  StDev  Mean
FS       7   30.7   33.0    12
SF       6  -62.5   44.7    18

Difference = µ (FS) - µ (SF)
Estimate for difference:  93.2
95% CI for difference:  (45.8, 140.7)
T-Test of difference = 0 (vs ?): T-Value = 4.32  P-Value = 0.001  DF = 11
Both use Pooled StDev = 38.7403


Comments:
---------
The two-sample t-test with equal variances does indeed reproduce the
strong significance from above; note that t^2=4.32^2=18.7=F.


(b)
The model with subject random effects can be run in Minitab as long as
the interaction period*tx is not included (it will simply be dropped if
we try to include it).

MTB > GLM;
SUBC>   Response 'pef';
SUBC>   Nodefault;
SUBC>   Categorical 'id' 'period' 'tx';
SUBC>   Random id;
SUBC>   Terms id period tx;
SUBC>   Means period tx;
SUBC>   TExpand;
SUBC>   TMethod;
SUBC>   TAnova;
SUBC>   TSummary;
SUBC>   TCoefficients;
SUBC>   TEquation;
SUBC>   TFactor;
SUBC>   TEMS;
SUBC>   TVariance;
SUBC>   TMeans;
SUBC>   TDiagnostics 0;
SUBC>   Rtype 2.
General Linear Model: pef versus id, period, tx 

Factor Information
Factor  Type    Levels  Values
id      Random      13  1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 14
period  Fixed        2  1, 2
tx      Fixed        2  F, S

Analysis of Variance
Source    DF  Seq SS  Contribution  Adj SS   Adj MS  F-Value  P-Value
  id      12  115213        83.19%  115213   9601.1    12.79    0.000
  period   1     985         0.71%    1632   1632.1     2.17    0.168
  tx       1   14036        10.14%   14036  14035.9    18.70    0.001
Error     11    8254         5.96%    8254    750.4
Total     25  138488       100.00%

      S    R-sq  R-sq(adj)    PRESS  R-sq(pred)
27.3935  94.04%     86.45%  46516.8      66.41%

Coefficients
Term        Coef  SE Coef       95% CI       T-Value  P-Value   VIF
Constant  318.46     5.37  (306.64, 330.29)    59.28    0.000
id
  1        -28.5     18.6  ( -69.4,   12.5)    -1.53    0.154     *
  2         59.0     18.6  (  18.1,  100.0)     3.17    0.009     *
  3         36.5     18.6  (  -4.4,   77.5)     1.96    0.075     *
  4        -33.5     18.6  ( -74.4,    7.5)    -1.80    0.100     *
  5         76.5     18.6  (  35.6,  117.5)     4.11    0.002     *
  6         16.5     18.6  ( -24.4,   57.5)     0.89    0.393     *
  7         81.5     18.6  (  40.6,  122.5)     4.38    0.001     *
  9        -13.5     18.6  ( -54.4,   27.5)    -0.72    0.485     *
  10       -88.5     18.6  (-129.4,  -47.5)    -4.75    0.001     *
  11        46.5     18.6  (   5.6,   87.5)     2.50    0.029     *
  12       -18.5     18.6  ( -59.4,   22.5)    -0.99    0.343     *
  13      -163.5     18.6  (-204.4, -122.5)    -8.78    0.000     *
period
  1        -7.95     5.39  (-19.81,   3.91)    -1.47    0.168  1.01
tx
  F        23.30     5.39  ( 11.44,  35.16)     4.32    0.001  1.01

Regression Equation
pef = 318.46 - 28.5 id_1 + 59.0 id_2 + 36.5 id_3 - 33.5 id_4 + 76.5 id_5 + 16.5 id_6 + 81.5 id_7
      - 13.5 id_9 - 88.5 id_10 + 46.5 id_11 - 18.5 id_12 - 163.5 id_13 + 29.0 id_14 - 7.95 period_1
      + 7.95 period_2 + 23.30 tx_F - 23.30 tx_S
Equation treats random terms as though they are fixed.

Expected Mean Squares, using Adjusted SS
           Expected Mean Square
   Source  for Each Term
1  id      (4) + 2.0000 (1)
2  period  (4) + Q[2]
3  tx      (4) + Q[3]
4  Error   (4)
...

Means
Term    Fitted Mean
period
  1         310.515
  2         326.408
tx
  F         341.765
  S         295.158

Variance Components, using Adjusted SS
Source  Variance  % of Total    StDev  % of Total
id       4425.36      85.50%  66.5234      92.47%
Error    750.406      14.50%  27.3935      38.08%
Total    5175.76              71.9428


Comments:
---------
The ANOVA table is unchanged from the fixed effects model. The estimated
variance component for subjects is very large: 4425 compared to 750 for
the within-subject variation, corresponding to 85% of the unexplained 
variation. This shows that the cross-over design is much more efficient 
than an ordinary completely randomised design would have been.


Stata commands and output for part (b):


. mixed pef i.Tx i.period || id:, reml

Performing EM optimization: 
Performing gradient-based optimization: 
Iteration 0:   log restricted-likelihood = -127.56466  
Iteration 1:   log restricted-likelihood = -127.56466  

Computing standard errors:

Mixed-effects REML regression                   Number of obs      =        26
Group variable: id                              Number of groups   =        13

                                                Obs per group: min =         2
                                                               avg =       2.0
                                                               max =         2

                                                Wald chi2(2)       =     20.02
Log restricted-likelihood = -127.56466          Prob > chi2        =    0.0000
------------------------------------------------------------------------------
         pef |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          Tx |
          S  |  -46.60714   10.77656    -4.32   0.000    -67.72881   -25.48548
    2.period |   15.89286   10.77656     1.47   0.140    -5.228807    37.01452
       _cons |   333.8187   20.56391    16.23   0.000     293.5142    374.1232
------------------------------------------------------------------------------
------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Identity                 |
                  var(_cons) |    4425.36   1966.341       1852.38    10572.24
-----------------------------+------------------------------------------------
               var(Residual) |   750.4055   319.9739      325.3438     1730.81
------------------------------------------------------------------------------
LR test vs. linear regression: chibar2(01) =    14.67 Prob >= chibar2 = 0.0001


. * added analysis with tx by period interaction
. mixed pef i.Tx##i.period || id:, reml

Performing EM optimization: 
Performing gradient-based optimization: 
Iteration 0:   log restricted-likelihood = -122.26323  
Iteration 1:   log restricted-likelihood = -122.26323  

Computing standard errors:

Mixed-effects REML regression                   Number of obs      =        26
Group variable: id                              Number of groups   =        13

                                                Obs per group: min =         2
                                                               avg =       2.0
                                                               max =         2

                                                Wald chi2(3)       =     20.05
Log restricted-likelihood = -122.26323          Prob > chi2        =    0.0002
------------------------------------------------------------------------------
         pef |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          Tx |
          S  |  -53.80952   41.62196    -1.29   0.196    -135.3871    27.76802
    2.period |   8.690476   41.62196     0.21   0.835    -72.88707    90.26802
             |
   Tx#period |
        S#2  |   14.40476   80.40531     0.18   0.858    -143.1867    171.9963
             |
       _cons |   337.1429   28.27655    11.92   0.000     281.7218    392.5639
------------------------------------------------------------------------------
------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Identity                 |
                  var(_cons) |   4846.539   2232.299      1965.039    11953.42
-----------------------------+------------------------------------------------
               var(Residual) |   750.4055   319.9739      325.3438     1730.81
------------------------------------------------------------------------------
LR test vs. linear regression: chibar2(01) =    15.24 Prob >= chibar2 = 0.0000


Comments:
---------
The random effects model without the interaction gives similar but not
identical results to the ANOVA-based analysis. The variance components
are identical (as is true for all "nice" designs, when the likelihood-based
analysis uses REML estimation). The tests for the fixed effects give slightly 
more significant P-values, due to the use of a standard normal reference distribution 
instead of the (exact) t-distribution.

The treatment by period interaction shows absolutely no significance.
Note however the changed results for the two main effects; these are no
longer useful (the model should be refitted without the interaction
before assessing the main effects). A treatment by period interaction
would mean that the difference between treatments depends on the period.
Such an effect would most likely be a carry-over effect (although there
could also be genuinely different treatment effects in periods). For example,
treatment A differs in periods 1 and 2 by being either the first treatment
applied (AB) and by being the follow-up treatment after treatment B was 
applied (BA). In the simple AB/BA-design we cannot from the data alone 
distinguish between carry-over effects and a genuine treatment by period 
interaction. With a totally non-significant interaction, we conclude 
that no carry-over effects seem to be present in these data.