Solution file for Exercise 13.5 (GO)
------------------------------------

Data: measurements of responses  of 16 disk drives produced with 
4 different substrates (A-D), on 4 different days, by 4 different 
machines and 4 different operators. Notation:

  y_i = response (in microvolts times 10^-2) for i'th disk drive, i=1,...,16,
or
  y_ijkl = response (in microvolts times 10^-2) for drive produced with
substrate i, by operator j, and on machine k on day l,
  i=A,B,C,D; j=1,2,3,4; k=1,2,3,4; l=1,2,3,4.
  
The design is a 4x4 Graeco-Latin square because the symbols for both substrates
and days occur once in every row and column, and in addition every (substrate,day)
occurs exactly once. The machine, operator and day may be considered as blocking 
factors, thus the design allows to account for three blocking factors
simultaneously. Note that the sequential and partial (adjusted) sum of
squares in the ANOVA table below are identical; this is a result of the
orthogonality of all factors (allowing them to be assessed
independently).

The statistical model is
    y_i = mu + alpha_substrate(i) + beta_operator(i) + gamma_macine(i) + delta_day(i) + eps_i,
or
    y_ijkl = mu + alpha_i + beta_j + gamma_k + + delta_l + epsilon_ijkl,
depending on the chosen notation.

MTB > WOpen "H:\VHM\VHM802\Data_csv\ch13ex5.csv";
SUBC>   FType;
SUBC>     CSV;
SUBC>   DecSep;
SUBC>     Period;
SUBC>   Field;
SUBC>     Comma;
SUBC>   TDelimiter;
SUBC>     DoubleQuote.
Retrieving worksheet from file: ‘H:\VHM\VHM802\Data_csv\ch13ex5.csv’
Worksheet was saved on 04/02/2012

MTB > GLM;
SUBC>   Response 'y';
SUBC>   Nodefault;
SUBC>   Categorical 'operator' 'machine' 'day_txt' 'tx_txt';
SUBC>   Terms operator machine C6 C7;
SUBC>   Means operator machine C6 C7;
SUBC>   TExpand;
SUBC>   TMethod;
SUBC>   TAnova;
SUBC>   TSummary;
SUBC>   TCoefficients;
SUBC>   TEquation;
SUBC>   TFactor;
SUBC>   TMeans;
SUBC>   TDiagnostics 0;
SUBC>   Rtype 2;
SUBC>  GFOURPACK.
General Linear Model: y versus operator, machine, day_txt, tx_txt 

Method
Factor coding  (-1, 0, +1)

Factor Information
Factor    Type   Levels  Values
operator  Fixed       4  1, 2, 3, 4
machine   Fixed       4  1, 2, 3, 4
day_txt   Fixed       4  alpha, beta, delta, gamma
tx_txt    Fixed       4  A, B, C, D

Analysis of Variance
Source      DF   Seq SS  Contribution  Adj SS  Adj MS  F-Value  P-Value
  operator   3   14.000        11.48%  14.000   4.667     0.65    0.633
  machine    3   21.500        17.62%  21.500   7.167     1.00    0.500
  day_txt    3    3.500         2.87%   3.500   1.167     0.16    0.915
  tx_txt     3   61.500        50.41%  61.500  20.500     2.86    0.206
Error        3   21.500        17.62%  21.500   7.167
Total       15  122.000       100.00%

Model Summary
      S    R-sq  R-sq(adj)    PRESS  R-sq(pred)
2.67706  82.38%     11.89%  611.556       0.00%

Coefficients
Term       Coef  SE Coef      95% CI      T-Value  P-Value   VIF
Constant  6.000    0.669  (3.870, 8.130)     8.97    0.003
operator
  1       -0.50     1.16  (-4.19,  3.19)    -0.43    0.695  1.50
  2        1.50     1.16  (-2.19,  5.19)     1.29    0.286  1.50
  3       -1.00     1.16  (-4.69,  2.69)    -0.86    0.452  1.50
machine
  1        1.25     1.16  (-2.44,  4.94)     1.08    0.360  1.50
  2       -1.50     1.16  (-5.19,  2.19)    -1.29    0.286  1.50
  3        1.00     1.16  (-2.69,  4.69)     0.86    0.452  1.50
day_txt
  alpha    0.00     1.16  (-3.69,  3.69)     0.00    1.000  1.50
  beta     0.25     1.16  (-3.44,  3.94)     0.22    0.843  1.50
  delta   -0.75     1.16  (-4.44,  2.94)    -0.65    0.564  1.50
tx_txt
  A       -0.25     1.16  (-3.94,  3.44)    -0.22    0.843  1.50
  B       -0.25     1.16  (-3.94,  3.44)    -0.22    0.843  1.50
  C        3.00     1.16  (-0.69,  6.69)     2.59    0.081  1.50

Regression Equation
y = 6.000 - 0.50 operator_1 + 1.50 operator_2 - 1.00 operator_3 - 0.00 operator_4 + 1.25 machine_1
    - 1.50 machine_2 + 1.00 machine_3 - 0.75 machine_4 + 0.00 day_txt_alpha + 0.25 day_txt_beta
    - 0.75 day_txt_delta + 0.50 day_txt_gamma - 0.25 tx_txt_A - 0.25 tx_txt_B + 3.00 tx_txt_C
    - 2.50 tx_txt_D

Means
          Fitted
Term        Mean  SE Mean
operator
  1         5.50     1.34
  2         7.50     1.34
  3         5.00     1.34
  4         6.00     1.34
machine
  1         7.25     1.34
  2         4.50     1.34
  3         7.00     1.34
  4         5.25     1.34
day_txt
  alpha     6.00     1.34
  beta      6.25     1.34
  delta     5.25     1.34
  gamma     6.50     1.34
tx_txt
  A         5.75     1.34
  B         5.75     1.34
  C         9.00     1.34
  D         3.50     1.34
 
Residual Plots for y 


Comments:
---------
The analysis shows no significant effect of treatments, and nor any significant 
effects of any of the blocking factors. With only 3 degrees of freedom for error,
the residuals are not of any use for model checking (only four different
values are taken). Note that the power of the F-tests is also rather low with 
so few degrees of freedom for error, and we should perhaps not consider the treatment
effects as totally non-interesting. The means show the biggest difference between 
C an D, the two types of glass substrates. It is not clear whether that would be
an expected or unexpected result. 

One question is whether to refit the model without the non-significant factors
(excluding the treatment, of course, which is the factor of primary interest).
Usually this is not worthwhile because the partial and sequential sum of
squares coincide (so that results would be unchanged). However, with only 3
degrees of freedom for error, one potential and important advantage could be to
increase the degrees of freedom for error (pooling). Because the non-significant 
effects here are both quite small, pooling will also decrease the estimated error 
variance, which also leads to stronger effects of the remaining factors. We explore 
the effects of pooling all blocking effects into error, thereby effectively ignoring
the statistical design completely (and reducing the analysis to a one-way ANOVA). 

MTB > OneWay;
SUBC>   Response 'y';
SUBC>   Categorical 'tx_txt';
SUBC>   IType 0;
SUBC>   Fisher 5;
SUBC>   TGrouping;
SUBC>   TMTest;
SUBC>   GIntPlot;
SUBC>   GFourpack;
SUBC>   TExpand;
SUBC>   TMethod;
SUBC>   TFactor;
SUBC>   TANOVA;
SUBC>   TSummary;
SUBC>   TMeans;
SUBC>   Nodefault.
One-way ANOVA: y versus tx_txt 

Method
Null hypothesis         All means are equal
Alternative hypothesis  At least one mean is different
Significance level      a = 0.05
Equal variances were assumed for the analysis.

Factor Information
Factor  Levels  Values
tx_txt       4  A, B, C, D

Analysis of Variance
Source  DF  Seq SS  Contribution  Adj SS  Adj MS  F-Value  P-Value
tx_txt   3   61.50        50.41%   61.50  20.500     4.07    0.033
Error   12   60.50        49.59%   60.50   5.042
Total   15  122.00       100.00%

      S    R-sq  R-sq(adj)    PRESS  R-sq(pred)
2.24537  50.41%     38.01%  107.556      11.84%

Means
tx_txt  N   Mean  StDev       95% CI
A       4   5.75   2.22  ( 3.30,   8.20)
B       4   5.75   3.30  ( 3.30,   8.20)
C       4  9.000  1.633  (6.554, 11.446)
D       4  3.500  1.291  (1.054,  5.946)
Pooled StDev = 2.24537
 
Fisher Pairwise Comparisons 
Grouping Information Using the Fisher LSD Method and 95% Confidence

tx_txt  N   Mean  Grouping
C       4  9.000  A
B       4   5.75  A B
A       4   5.75  A B
D       4  3.500    B
Means that do not share a letter are significantly different.

Fisher Individual Tests for Differences of Means
Difference  Difference       SE of                           Adjusted
of Levels     of Means  Difference      95% CI      T-Value   P-Value
B - A             0.00        1.59  (-3.46,  3.46)     0.00     1.000
C - A             3.25        1.59  (-0.21,  6.71)     2.05     0.063
D - A            -2.25        1.59  (-5.71,  1.21)    -1.42     0.182
C - B             3.25        1.59  (-0.21,  6.71)     2.05     0.063
D - B            -2.25        1.59  (-5.71,  1.21)    -1.42     0.182
D - C            -5.50        1.59  (-8.96, -2.04)    -3.46     0.005
Simultaneous confidence level = 81.57%
 
Interval Plot of y vs tx_txt 
Residual Plots for y 


Comments:
---------
The change in results is remarkable, with the treatment factor now
becoming significant. The only significant pairwise comparison between
treatments is between C and D (the two glass substrates), and this comparison
would also be significant after Bonferroni adjustment (i.e., multiplying
all P-value by 6). The residual plots look good.

It is essentially impossible to determine from the data whether the
analysis without the blocking factors is valid. The best conclusion is
perhaps that the study shows an indication of treatment differences and 
suggests further investigation of treatment effects, preferably in an
experiment with greater power than the 4x4 Graeco-Latin design.