Solution file for Problems 3.5 and 4.3 (GO)
-------------------------------------------

Data: measurements of compressive strength of concrete cubes. Fifteen cubes 
were produced and randomly assigned to five levels of poplypropylene fiber 
content, with three cubes per group, and fiber content ranging from 0 to 1%, 
in equidistant steps of 0.25%.

The data constitute 5 independent samples with continuous outcome, and 
the model immediately suggested is a one-way ANOVA.

Problem 3.5:
------------
We run the one-way ANOVA analysis, including checks of the assumptions of 
normality and same standard deviations in the groups.

MTB > WOpen "h:\VHM\VHM802\Data_csv\ch03pr5.csv";
SUBC>   FType;
SUBC>     CSV;
SUBC>   DecSep;
SUBC>     Period;
SUBC>   Field;
SUBC>     Comma;
SUBC>   TDelimiter;
SUBC>     DoubleQuote.
Retrieving worksheet from file: 'h:\VHM\VHM802\Data_csv\ch03pr5.csv'
Worksheet was saved on 11/02/2011

MTB > Oneway 'strength' 'fiber';
SUBC>   GFourpack.
One-way ANOVA: strength versus fiber 

Source  DF     SS     MS      F      P
fiber    4  6.263  1.566  12.98  0.001
Error   10  1.207  0.121
Total   14  7.469

S = 0.3474   R-Sq = 83.85%   R-Sq(adj) = 77.38%

                          Individual 95% CIs For Mean Based on
                          Pooled StDev
Level  N    Mean   StDev  ----+---------+---------+---------+-----
0.00   3  7.4667  0.3055                          (------*-----)
0.25   3  7.5667  0.3055                            (-----*-----)
0.50   3  6.8667  0.5508                  (-----*-----)
0.75   3  6.7000  0.3000               (------*-----)
1.00   3  5.7667  0.1528  (-----*------)
                          ----+---------+---------+---------+-----
                            5.60      6.30      7.00      7.70
Pooled StDev = 0.3474

Residual Plots for strength 

Comments:
---------
The residual plot and the table of standard deviations do not indicate
serious problems with model assumptions. The standard deviations differ
somewhat between groups but that could very well be a result of the
small within-group sample size only. Tests for homogeneity of variance
are clearly non-significant (not shown).

The ANOVA table shows a strongly significant difference between groups,
and the groups account for more than 80% of the variation in the data
(such a high R^2 is easier to achieve in a small dataset).

The estimated means and the graphical representation of confidence
intervals shows that strength tends to decrease with increasing fiber 
levels. The decline does not seem to entirely linear because both the 
first two means and the third and fourth mean are fairly similar. When
the groups are defined by quantitative values (the fiber content), it is
often less obvious to carry out multiple comparisons, and one would
instead focus on describing the relationship with the quantitative
variable. A first step would be to produce a table and/or graph with group 
means and 95% confidence intervals.

MTB > GLM 'strength' = fiber;
SUBC>   Brief 2 ;
SUBC>   Means fiber.
General Linear Model: strength versus fiber 

Factor  Type   Levels  Values
fiber   fixed       5  0.00, 0.25, 0.50, 0.75, 1.00

Analysis of Variance for strength, using Adjusted SS for Tests

Source  DF  Seq SS  Adj SS  Adj MS      F      P
fiber    4  6.2627  6.2627  1.5657  12.98  0.001
Error   10  1.2067  1.2067  0.1207
Total   14  7.4693

S = 0.347371   R-Sq = 83.85%   R-Sq(adj) = 77.38%

Least Squares Means for strength

fiber   Mean  SE Mean
0.00   7.467   0.2006
0.25   7.567   0.2006
0.50   6.867   0.2006
0.75   6.700   0.2006
1.00   5.767   0.2006

MTB > Intplot ( 'strength' ) * 'fiber';
SUBC>   Intbar;
SUBC>   Mean.
Interval Plot of strength

Note that the interval plots need to modified to use the residual
standard standard deviation as the basis for all CIs. This can be achieved
by editing the graph, by right-clicking the interval bars and choosing
the submenu Edit Interval Bar.


Problem 4.3:
------------
Coefficients for polynomial contrasts (with equidistant x-values and
equal group sizes) are listed in Appendix D, Table D.6. As Minitab does
not allow easy manipulation of estimates, all calculations need to be
done by hand. One alternative is to fit the linear, quadratic, cubic and
4th order regression models directly. This will give tests for each of
the polynomial contrasts as the tests for the highest order polynomial
coefficient, although these tests will be based on the reduced model
instead of the full oneway model. The table of sequential sum of squares
for the fourth order model also gives the sum of squares for each
contrast.


MTB > Regress 'strength' 1 'fiber';
SUBC>   Constant;
SUBC>   Brief 2.
Regression Analysis: strength versus fiber 

The regression equation is
strength = 7.73 - 1.71 fiber

Predictor     Coef  SE Coef      T      P
Constant    7.7267   0.1758  43.96  0.000
fiber      -1.7067   0.2870  -5.95  0.000

S = 0.393016   R-Sq = 73.1%   R-Sq(adj) = 71.0%

Analysis of Variance

Source          DF      SS      MS      F      P
Regression       1  5.4613  5.4613  35.36  0.000
Residual Error  13  2.0080  0.1545
Total           14  7.4693

MTB > Name C4 'fiber2'
MTB > Let 'fiber2' = fiber**2
MTB > Name C5 'fiber3'
MTB > Let 'fiber3' = fiber**3
MTB > Name C6 'fiber4'
MTB > Let 'fiber4' = fiber**4

MTB > Regress 'strength' 2 'fiber' 'fiber2';
SUBC>   Constant;
SUBC>   Brief 2.
Regression Analysis: strength versus fiber, fiber2 

The regression equation is
strength = 7.51 + 0.046 fiber - 1.75 fiber2

Predictor     Coef  SE Coef      T      P
Constant    7.5076   0.1924  39.03  0.000
fiber       0.0457   0.9115   0.05  0.961
fiber2     -1.7524   0.8741  -2.00  0.068

S = 0.354047   R-Sq = 79.9%   R-Sq(adj) = 76.5%

Analysis of Variance

Source          DF      SS      MS      F      P
Regression       2  5.9651  2.9826  23.79  0.000
Residual Error  12  1.5042  0.1253
Total           14  7.4693

Source  DF  Seq SS
fiber    1  5.4613
fiber2   1  0.5038

Unusual Observations

Obs  fiber  strength     Fit  SE Fit  Residual  St Resid
 13   0.50    6.3000  7.0924  0.1425   -0.7924     -2.44R

R denotes an observation with a large standardized residual.

MTB > Regress 'strength' 3 'fiber' 'fiber2' 'fiber3';
SUBC>   Constant;
SUBC>   Brief 2.
 
Regression Analysis: strength versus fiber, fiber2, fiber3 

The regression equation is
strength = 7.50 + 0.14 fiber - 2.02 fiber2 + 0.18 fiber3

Predictor    Coef  SE Coef      T      P
Constant   7.5043   0.2119  35.41  0.000
fiber       0.141    2.157   0.07  0.949
fiber2     -2.019    5.477  -0.37  0.719
fiber3      0.178    3.600   0.05  0.962

S = 0.369749   R-Sq = 79.9%   R-Sq(adj) = 74.4%

Analysis of Variance

Source          DF      SS      MS      F      P
Regression       3  5.9655  1.9885  14.54  0.000
Residual Error  11  1.5039  0.1367
Total           14  7.4693

Source  DF  Seq SS
fiber    1  5.4613
fiber2   1  0.5038
fiber3   1  0.0003

Unusual Observations

Obs  fiber  strength     Fit  SE Fit  Residual  St Resid
 13   0.50    6.3000  7.0924  0.1488   -0.7924     -2.34R

R denotes an observation with a large standardized residual.

MTB > Regress 'strength' 4 'fiber' 'fiber2' 'fiber3' 'fiber4';
SUBC>   Constant;
SUBC>   Brief 2.
 
Regression Analysis: strength versus fiber, fiber2, fiber3, fiber4 

The regression equation is
strength = 7.47 + 6.41 fiber - 36.4 fiber2 + 56.4 fiber3 - 28.1 fiber4

Predictor    Coef  SE Coef      T      P
Constant   7.4667   0.2006  37.23  0.000
fiber       6.411    4.480   1.43  0.183
fiber2     -36.38    22.49  -1.62  0.137
fiber3      56.36    35.96   1.57  0.148
fiber4     -28.09    17.90  -1.57  0.148

S = 0.347371   R-Sq = 83.8%   R-Sq(adj) = 77.4%

Analysis of Variance

Source          DF      SS      MS      F      P
Regression       4  6.2627  1.5657  12.98  0.001
Residual Error  10  1.2067  0.1207
Total           14  7.4693

Source  DF  Seq SS
fiber    1  5.4613
fiber2   1  0.5038
fiber3   1  0.0003
fiber4   1  0.2972

Comments:
---------
The table below summarizes the information that can be extracted from
these analyses.

term        coefficient     SE      t     P       SS 
------------------------------------------------------
linear          -1.7067   0.2870  -5.95  0.000   5.461
quadratic       -1.7524   0.8741  -2.00  0.068   0.504  
cubic            0.178    3.600    0.05  0.962   0.000
quartic        -28.09    17.90    -1.57  0.148   0.297
------------------------------------------------------

From the table it is clear that the two highest order terms do not
improve the model substantially, and the choice is therefore between the
linear and quadratic model. The slope is negative, corresponding to the
already noted strong declining trend with fiber content, and the sign of
the quadratic is negative as well, corresponding to a downwards bending
curve. The Fitted Line Plot menu can be used to get a graph of the
estimated quadratic curve. With P so close to 0.05, the choice between
the two models should involve subject matter considerations (i.e.,
whether it is preferable to have a linear or quadratic prediction
equation).

Computing and evaluating the polynomial contrasts in the oneway model
(using either a calculator or another software) yields the following
table:

Contrast   Estimate   SE    SS    SS(%)   t     P(t)
------------------------------------------------------
linear      -4.267  .6342  5.461  87.2  -6.73  0.000 
quadratic   -1.533  .7504  0.504   8.0  -2.04  0.068  
cubic        0.033  .6342  0.000   0.0   0.05  0.959 
quartic     -2.633  1.678  0.297   4.7  -1.57  0.148  
------------------------------------------------------

The difference in t-values and P-values is due to use of different
SEs and DFs, as explained above.

Note that one would usually not use the Scheffe test to adjust for
contrasts derived from the data when assessing polynomial contrasts. One
could argue that the idea of splitting the variation between into groups
into orthogonal parts based on polynomial terms is universal and hence
not derived from the actual data. If there was concern about the inflated 
type I error from assessing 4 contrasts, a Bonferroni or Holm correction
could be used.