Solution file for Problems 3.2, 4.2 and 5.1 (GO)
------------------------------------------------

Data: measurements of longevity (in days) of male fruit flies subjected
to 5 different reproduction conditions. A total of 125 male fruit flies
were randomly distributed onto 5 groups that differed in the exposure to
females. Group 1 had no exposure to females; Groups 2 and 4 were daily 
exposed to 1 and 8 pregnant (therefore unreceptive) females,
respectively. Groups 3 and 5 were daily exposed to 1 and 8 virgin
(therefore receptive) females, respectively.

The data constitute 5 independent samples with continuous outcome (although 
recorded in days, but this discretisation should not be serious because the
data contain a wide range of days), and the model immediately suggested is 
a one-way ANOVA. The experiment constitute a completely randomized design 
with 5 groups.

Problem 3.2:
------------
We compute per group descriptive summaries and run the one-way ANOVA analysis, 
including checks of the assumptions of normality and same standard deviations 
in the groups.

MTB > WOpen "h:\VHM\VHM802\Data_csv\ch03pr2.csv";
SUBC>   FType;
SUBC>     CSV;
SUBC>   DecSep;
SUBC>     Period;
SUBC>   Field;
SUBC>     Comma;
SUBC>   TDelimiter;
SUBC>     DoubleQuote.
Retrieving worksheet from file: 'h:\VHM\VHM802\Data_csv\ch03pr2.csv'
Worksheet was saved on 11/02/2011

MTB > GSummary 'longev';
SUBC>   By 'group'.
Results for group = 1 
Summary for longev (group = 1) 
Results for group = 2 
Summary for longev (group = 2) 
Results for group = 3 
Summary for longev (group = 3) 
Results for group = 4 
Summary for longev (group = 4) 
Results for group = 5 
Summary for longev (group = 5) 

MTB > Oneway 'longev' 'group';
SUBC>   GBoxplot;
SUBC>   GFourpack.
One-way ANOVA: longev versus group 

Source   DF     SS    MS      F      P
group     4  11939  2985  13.61  0.000
Error   120  26314   219
Total   124  38253

S = 14.81   R-Sq = 31.21%   R-Sq(adj) = 28.92%

                         Individual 95% CIs For Mean Based on
                         Pooled StDev
Level   N   Mean  StDev  -------+---------+---------+---------+--
1      25  63.36  14.54                          (-----*-----)
2      25  63.56  16.45                           (-----*----)
3      25  64.80  15.65                            (-----*-----)
4      25  56.76  14.93                    (-----*-----)
5      25  38.72  12.10  (-----*-----)
                         -------+---------+---------+---------+--
                               40        50        60        70
Pooled StDev = 14.81

Boxplot of longev 
Residual Plots for longev 

Comments:
---------
We first note that no problems with the model assumptions could be found. 
The within-group distributions look fairly normal (and all normality tests
are non-significant). The residual plots looks very nice for a dataset of this
size, and the standard deviations are quite close. 

The ANOVA table shows a strongly significant difference between groups,
despite a fairly low R^2 value. 

The estimated means and the graphical representation of confidence
intervals suggest that group 5 differs significantly from all other
groups which seem pretty close. The non-overlapping confidence intervals
with group 5 shows that t-tests unadjusted for multiple comparisons 
would all be significant when comparing group 5 to the other groups. 
The almost totally non-overlapping intervals of groups 1-3 (where 
estimates are inside the other intervals) shows that there is no
significant difference between these groups. 


Problem 4.2:
------------
A set of orthogonal contrasts can be set up in many ways, the following
seemed the most natural to me based on the description of the groups.
For comparison, the strongest contrast in the data, between group 5 and the
others, is also included although it is not part of the orthogonal set.

Contrast      Interpretation         Coefficients
--------------------------------------------------
company       contact to females     4 -1 -1 -1 -1 
receptive     pregnant vs virgins    0 1 -1 1 -1
# pregnant    1 vs 8 pregnant        0 1 0 -1 0
# virgins     1 vs 8 virgins         0 0 1 0 -1
group 5       group 5 vs others      -1 -1 -1 -1 4

The first four contrasts are pairwise orthogonal, because for any pair 
the sum of products of coefficients is zero. For example, for contrasts 
1 and 2: 
  4*0 + (-1)*1) + (-1)*(-1) + (-1)*1 + (-1)*(-1) = 0
or for contrasts 2 and 3:
  0*0 + 1*1 + (-1)*0 + 1*(-1) + (-1)*0 = 0.

Contrast       Estimate   SE    SS    SS(%)   t     P(t)   F(Schef) P(Schef)
-------------------------------------------------------------------------------
company          29.6   13.2  1095.2   9.2   2.23  0.027   1.249    0.294  
receptive        16.8   5.92  1764.0  14.8   2.84  0.005   2.011    0.097
# pregnant        6.8   4.19   578.0   4.8   1.62  0.107   0.659    0.621 
# virgins        26.1   4.19  8502.1  71.2   6.23  0.000   9.693    0.000
group 5         -93.6   13.2 10951.2  91.7  -7.07  0.000   12.49    0.000      
-------------------------------------------------------------------------------

formulae:  SS=(estimate^2)/[(w_1^2+...+w_5^2)/25]
           SS (%) = SS / SSTrT (SSTrT=11939.28)
           t=Est/SE=sqrt(SS/MSE) (MSE=219.28)
           P(t) ~ t(120)
           F(Scheffe)=SS/4/MSE (or t^2/4)
           P(Scheffe) ~ F(4,120)

If the contrasts were pre-planned, the P-values from the t-test could be
used without adjustment (unless there was concern about carrying out 4
tests). This would make 3 out of the first 4 contrasts significant.
However, it's clear from looking at the results that these contrasts
don't represent the real effect well. The last contrast, developed by
inspecting the data (means), reflects the pattern in the data, and is
even with the Scheffe test strongly significant. This is probably the best
contrast representation of the difference between groups.


Problem 5.1:
------------
We rerun the model as a General Linear Model to get easy access to
Bonferroni corrected multiple comparisons.

MTB > GLM   'longev' = group;
SUBC>   Brief 2 ;
SUBC>   Pairwise group;
SUBC>     Bonferroni;
SUBC>     NoCI.
General Linear Model: longev versus group 

Factor  Type   Levels  Values
group   fixed       5  1, 2, 3, 4, 5

Analysis of Variance for longev, using Adjusted SS for Tests

Source   DF   Seq SS   Adj SS  Adj MS      F      P
group     4  11939.3  11939.3  2984.8  13.61  0.000
Error   120  26313.5  26313.5   219.3
Total   124  38252.8

S = 14.8081   R-Sq = 31.21%   R-Sq(adj) = 28.92%

Unusual Observations for longev

Obs   longev      Fit  SE Fit  Residual  St Resid
 43  96.0000  63.5600  2.9616   32.4400      2.24 R
 49  96.0000  63.5600  2.9616   32.4400      2.24 R
 66  97.0000  64.8000  2.9616   32.2000      2.22 R
 76  21.0000  56.7600  2.9616  -35.7600     -2.46 R

R denotes an observation with a large standardized residual.

Grouping Information Using Bonferroni Method and 95.0% Confidence

group   N  Mean  Grouping
3      25  64.8  A
2      25  63.6  A
1      25  63.4  A
4      25  56.8  A
5      25  38.7    B

Means that do not share a letter are significantly different.

Bonferroni Simultaneous Tests
Response Variable longev
All Pairwise Comparisons among Levels of group
group = 1  subtracted from:

       Difference       SE of           Adjusted
group    of Means  Difference  T-Value   P-Value
2            0.20       4.188    0.048    1.0000
3            1.44       4.188    0.344    1.0000
4           -6.60       4.188   -1.576    1.0000
5          -24.64       4.188   -5.883    0.0000

group = 2  subtracted from:

       Difference       SE of           Adjusted
group    of Means  Difference  T-Value   P-Value
3            1.24       4.188    0.296    1.0000
4           -6.80       4.188   -1.624    1.0000
5          -24.84       4.188   -5.931    0.0000

group = 3  subtracted from:

       Difference       SE of           Adjusted
group    of Means  Difference  T-Value   P-Value
4           -8.04       4.188   -1.920    0.5728
5          -26.08       4.188   -6.227    0.0000

group = 4  subtracted from:

       Difference       SE of           Adjusted
group    of Means  Difference  T-Value   P-Value
5          -18.04       4.188   -4.307    0.0003

Comments for Bonferroni method:
-------------------------------
The results from the multiple comparisons are very clear: group 5 differs 
significantly from all other groups, the differences among which on the other 
hand are nowhere near significant. Thus the letter coding (corresponding to 
underlining) would be

5b 4a 1a 2a 3a

Holm method:
------------
We start by rearranging the list above in increasing order of P-values
(or equivalently from more to less extreme t-values):

       Difference       SE of           Adjusted
group    of Means  Difference  T-Value   P-Value
5 - 3      -26.08       4.188   -6.227    0.0000
5 - 2      -24.84       4.188   -5.931    0.0000
5 - 1      -24.64       4.188   -5.883    0.0000
5 - 4      -18.04       4.188   -4.307    0.0003
4 - 3       -8.04       4.188   -1.920    0.5728
4 - 2       -6.80       4.188   -1.624    1.0000
4 - 1       -6.60       4.188   -1.576    1.0000
3 - 1        1.44       4.188    0.344    1.0000
3 - 2        1.24       4.188    0.296    1.0000
2 - 1        0.20       4.188    0.048    1.0000

With a total of 10 comparisons, the Bonferroni-adjusted P-values above
have all been multiplied by 10. The Holm method will retrieve the
original P-values and then multiply those by 10,9,8,...,1 from top to
bottom row. The first four rows will still have significant P-values
with the Holm adjustment because the P-values will be smaller (except
for the first row where the P-value is the same) than those listed
above, and they were already significant with the Bonferroni method. For
row five we get the Holm adjusted P-value as (0.5728/10)*6=0.34. This
means that the comparison in row 5 is non-significant and hence all
subsequent rows are non-significant as well by the Holm method. The
significant comparisons are therefore the same as for the Bonferroni
method.