Solution file for GO Problems 3.2, 4.2 and 5.1
----------------------------------------------

Data: measurements of longevity (in days) of male fruit flies subjected
to 5 different reproduction conditions. A total of 125 male fruit flies
were randomly distributed onto 5 groups that differed in the exposure to
females. Group 1 had no exposure to females; Groups 2 and 4 were daily 
exposed to 1 and 8 pregnant (therefore unreceptive) females,
respectively. Groups 3 and 5 were daily exposed to 1 and 8 virgin
(therefore receptive) females, respectively.

The data constitute 5 independent samples with continuous outcome (although 
recorded in days, but this discretisation should not be serious because the
data contain a wide range of days), and the model immediately suggested is 
a one-way ANOVA. The experiment constitutes a completely randomized design 
with 5 groups.

Problem 3.2:
------------
We compute per group descriptive summaries and run the one-way ANOVA analysis, 
including checks of the assumptions of normality and same standard deviations 
in the groups.

MTB > WOpen "H:\VHM\VHM802\Data_csv\ch03pr2.csv";
SUBC>   FType;
SUBC>     CSV;
SUBC>   DecSep;
SUBC>     Period;
SUBC>   Field;
SUBC>     Comma;
SUBC>   TDelimiter;
SUBC>     DoubleQuote.
Retrieving worksheet from file: ‘H:\VHM\VHM802\Data_csv\ch03pr2.csv’
Worksheet was saved on 14/02/2011

MTB > GSummary 'longev';
SUBC>   By 'compan'.
Results for compan = 1 pregnant 
Summary Report for longev (compan = 1 pregnant) 
Results for compan = 1 virgin 
Summary Report for longev (compan = 1 virgin) 
Results for compan = 8 pregnant 
Summary Report for longev (compan = 8 pregnant) 
Results for compan = 8 virgin 
Summary Report for longev (compan = 8 virgin) 
Results for compan = none 
Summary Report for longev (compan = none) 

MTB > OneWay;
SUBC>   Response 'longev';
SUBC>   Categorical 'compan';
SUBC>   IType 0;
SUBC>   GMCI;
SUBC>   GIntPlot;
SUBC>   GFourpack;
SUBC>   TMethod;
SUBC>   TFactor;
SUBC>   TANOVA;
SUBC>   TSummary;
SUBC>   TMeans;
SUBC>   Nodefault.
One-way ANOVA: longev versus compan 

Method
Null hypothesis         All means are equal
Alternative hypothesis  At least one mean is different
Significance level      a = 0.05
Equal variances were assumed for the analysis.

Factor Information
Factor  Levels  Values
compan       5  1 pregnant, 1 virgin, 8 pregnant, 8 virgin, none

Analysis of Variance
Source   DF  Adj SS  Adj MS  F-Value  P-Value
compan    4   11939  2984.8    13.61    0.000
Error   120   26314   219.3
Total   124   38253

Model Summary
      S    R-sq  R-sq(adj)  R-sq(pred)
14.8081  31.21%     28.92%      25.36%

Means
compan       N   Mean  StDev      95% CI
1 pregnant  25  63.56  16.45  (57.70, 69.42)
1 virgin    25  64.80  15.65  (58.94, 70.66)
8 pregnant  25  56.76  14.93  (50.90, 62.62)
8 virgin    25  38.72  12.10  (32.86, 44.58)
none        25  63.36  14.54  (57.50, 69.22)
Pooled StDev = 14.8081
 
Interval Plot of longev vs compan 
Residual Plots for longev 

Comments:
---------
We first note that no problems with the model assumptions could be found. 
The within-group distributions look fairly normal (and all normality tests
are non-significant). The residual plots look very nice for a dataset of this
size, and the standard deviations are quite close. 

The ANOVA table shows a strongly significant difference between groups,
despite a fairly low R^2 value. 

The estimated means and the graphical representation of confidence
intervals suggest that group 5 (8 virgins) differs significantly from all other
groups which seem pretty close. The non-overlapping confidence intervals
with group 5 shows that t-tests unadjusted for multiple comparisons 
would all be significant when comparing group 5 to the other groups. 
The almost totally overlapping intervals of groups 1-3 (where 
estimates are inside the other intervals) shows that there is no
significant difference between these groups. 


Problem 4.2:
------------
A set of orthogonal contrasts can be set up in many ways, the following
seemed the most natural to me based on the description of the groups.
For comparison, the strongest (simple) contrast in the data, between group 5 and the
others, is also included although it is not part of the orthogonal set.

Contrast      Interpretation         Coefficients
--------------------------------------------------
company       contact to females     4 -1 -1 -1 -1 
receptive     pregnant vs virgins    0 1 -1 1 -1
# pregnant    1 vs 8 pregnant        0 1 0 -1 0
# virgins     1 vs 8 virgins         0 0 1 0 -1
group 5       group 5 vs others      -1 -1 -1 -1 4

The first four contrasts are pairwise orthogonal, because for any pair 
of them the sum of products of coefficients is zero. For example, for 
contrasts 1 and 2: 
  4*0 + (-1)*1) + (-1)*(-1) + (-1)*1 + (-1)*(-1) = 0
or for contrasts 2 and 3:
  0*0 + 1*1 + (-1)*0 + 1*(-1) + (-1)*0 = 0.

Contrast       Estimate   SE    SS    SS(%)   t     P(t)   F(Schef) P(Schef)
-------------------------------------------------------------------------------
company          29.6   13.2  1095.2   9.2   2.23  0.027   1.249    0.294  
receptive        16.8   5.92  1764.0  14.8   2.84  0.005   2.011    0.097
# pregnant        6.8   4.19   578.0   4.8   1.62  0.107   0.659    0.621 
# virgins        26.1   4.19  8502.1  71.2   6.23  0.000   9.693    0.000
group 5         -93.6   13.2 10951.2  91.7  -7.07  0.000   12.49    0.000      
-------------------------------------------------------------------------------

formulae:  SS=(estimate^2)/[(w_1^2+...+w_5^2)/25] (or t^2*MSE)
           SS (%) = SS / SSTrT (SSTrT=11939.28)
           t=Est/SE=sqrt(SS/MSE) (MSE=219.28)
           P(t) ~ t(120)
           F(Scheffe)=SS/4/MSE (or t^2/4)
           P(Scheffe) ~ F(4,120)

If the contrasts were pre-planned, the P-values from the t-test could be
used without adjustment (unless there was concern about carrying out 4
tests). This would make 3 out of the first 4 contrasts significant.
However, it's clear from looking at the results that these contrasts
don't represent the real effect well. The last contrast, developed by
inspecting the data (means), reflects the pattern in the data, and is
even with the Scheffe test strongly significant. This is probably the best
contrast representation of the difference between groups.


Problem 5.1:
------------
We rerun the model as a General Linear Model to get easy access to
Bonferroni corrected multiple comparisons.

MTB > GLM;
SUBC>   Response 'longev';
SUBC>   Nodefault;
SUBC>   Categorical 'compan';
SUBC>   Terms compan;
SUBC>   TMethod;
SUBC>   TAnova;
SUBC>   TSummary;
SUBC>   TCoefficients;
SUBC>   TEquation;
SUBC>   TFactor;
SUBC>   TDiagnostics 0.
General Linear Model: longev versus compan 
...
Analysis of Variance
Source     DF  Adj SS  Adj MS  F-Value  P-Value
  compan    4   11939  2984.8    13.61    0.000
Error     120   26314   219.3
Total     124   38253
...

MTB > Compare 'longev';
SUBC>   Pairwise compan;
SUBC>     Bonferroni;
SUBC>   GIntPlot;
SUBC>   NoDefault;
SUBC>   TGrouping;
SUBC>   TMTest.
Comparisons for longev 
 
Bonferroni Pairwise Comparisons: Response = longev, Term = compan 

Grouping Information Using the Bonferroni Method and 95% Confidence
compan       N   Mean  Grouping
1 virgin    25  64.80  A
1 pregnant  25  63.56  A
none        25  63.36  A
8 pregnant  25  56.76  A
8 virgin    25  38.72         B
Means that do not share a letter are significantly different.

Bonferroni Simultaneous Tests for Differences of Means
                             Difference       SE of    Simultaneous             Adjusted
Difference of compan Levels    of Means  Difference       95% CI       T-Value   P-Value
1 virgin - 1 pregnant              1.24        4.19  (-10.74,  13.22)     0.30     1.000
8 pregnant - 1 pregnant           -6.80        4.19  (-18.78,   5.18)    -1.62     1.000
8 virgin - 1 pregnant            -24.84        4.19  (-36.82, -12.86)    -5.93     0.000
none - 1 pregnant                 -0.20        4.19  (-12.18,  11.78)    -0.05     1.000
8 pregnant - 1 virgin             -8.04        4.19  (-20.02,   3.94)    -1.92     0.573
8 virgin - 1 virgin              -26.08        4.19  (-38.06, -14.10)    -6.23     0.000
none - 1 virgin                   -1.44        4.19  (-13.42,  10.54)    -0.34     1.000
8 virgin - 8 pregnant            -18.04        4.19  (-30.02,  -6.06)    -4.31     0.000
none - 8 pregnant                  6.60        4.19  ( -5.38,  18.58)     1.58     1.000
none - 8 virgin                   24.64        4.19  ( 12.66,  36.62)     5.88     0.000
Individual confidence level = 99.50%
Bonferroni Simultaneous 95% CIs 

Comments for Bonferroni method:
-------------------------------
The results from the multiple comparisons are very clear: group 5 differs 
significantly from all other groups, the differences among which on the other 
hand are nowhere near significant. Thus the letter coding  would be:

5b 4a 1a 2a 3a

Holm method:
------------
We start by rearranging the list above in increasing order of P-values
(or equivalently from more to less extreme t-values):

       Difference       SE of           Adjusted
group    of Means  Difference  T-Value   P-Value
5 - 3      -26.08       4.188   -6.227    0.0000
5 - 2      -24.84       4.188   -5.931    0.0000
5 - 1      -24.64       4.188   -5.883    0.0000
5 - 4      -18.04       4.188   -4.307    0.0003
4 - 3       -8.04       4.188   -1.920    0.5728
4 - 2       -6.80       4.188   -1.624    1.0000
4 - 1       -6.60       4.188   -1.576    1.0000
3 - 1        1.44       4.188    0.344    1.0000
3 - 2        1.24       4.188    0.296    1.0000
2 - 1        0.20       4.188    0.048    1.0000

With a total of 10 comparisons, the Bonferroni-adjusted P-values above
have all been multiplied by 10. The Holm method will retrieve the
original P-values and then multiply those by 10,9,8,...,1 from top to
bottom row. The first four rows will still have significant P-values
with the Holm adjustment because the P-values will be smaller (except
for the first row where the P-value is the same) than those listed
above, and they were already significant with the Bonferroni method. For
row five we get the Holm adjusted P-value as (0.5728/10)*6=0.34. This
means that the comparison in row 5 is non-significant and hence all
subsequent rows are non-significant as well by the Holm method. The
significant comparisons are therefore the same as for the Bonferroni
method.