Exercise 27.8 of PSLS 3e
------------------------

Data: 2 samples of corn yields (bushels per acre) for 4 plots without weeds and 
4 plots with 9 lamb's quarter plants per meter of row. 

Model: the 2 samples are independent and each a simple random sample 
(i.i.d. sample) from a distribution with unknown mean, median and standard
devation.

(a)
When using the Wilcoxon-Mann-Whitney test we may make no further assumptions
about the distributions, and test
  H0: P1=P2, versus Ha: P1 is systematically larger than P2,
where P1 and P2 are the distributions of corn yields in the two
populations (weed-free and weed-filled plots).

Alternatively, we may make the "delta-assumption" that the two
distributions are of the same shape (only differ by their location), and
test the hypotheses
  H0: median1=median2, versus Ha: median1>median2
(the alternative expressing higher yields in weed-free plots).

The motivation for the one-sided alternatives lies in the wording of the
question ("evidence that 9 weeds per meter reduces corn yields"), but in
practice one could also justify a two-sided alternative. Given the low
sample sizes it is perhaps tempting to increase the power of the
statistical analysis by using a one-sided alternative.

Minitab commands and output for the test:

MTB > WOpen "H:\VHM\VHM801\Datasets\Minitab\Chapter 27\ex27_008.mtw".
Retrieving worksheet from file: 'H:\VHM\VHM801\Datasets\Minitab\Chapter
27\ex27_008.mtw'
Worksheet was saved on 17/10/2014

MTB > Unstack ('yield');
SUBC>   Subscripts 'weeds';
SUBC>   After;
SUBC>   VarNames.

MTB > Mann-Whitney 95.0 'yield_0' 'yield_9';
SUBC>   Alternative 1.
Mann-Whitney Test and CI: yield_0, yield_9 

         N  Median
yield_0  4  169.45
yield_9  4  162.55

Point estimate for eta1-eta2 is 9.65
97.0 Percent CI for eta1-eta2 is (2.20,34.49)
W = 26.0
Test of eta1 = eta2 vs eta1 > eta2 is significant at 0.0152

Comments:
---------
The Wilcoxon rank sum test (or Wilcoxon-Mann-Whitney test) gives an
approximate P-value of 0.0152 for H0 against a one-sided alternative.
This means that there is some evidence that 9 weeds per meter
systematically reduce the corn yield.
(Note: the exact P-value is 0.014, obtained with another software.)


(b)
MTB > TwoSample 'yield_0' 'yield_9';
SUBC>   Confidence 95.0;
SUBC>   Test 0.0;
SUBC>   Alternative 1.
Two-Sample T-Test and CI: yield_0, yield_9 

Two-sample T for yield_0 vs yield_9

         N    Mean  StDev  SE Mean
yield_0  4  170.20   5.42      2.7
yield_9  4   157.6   10.1      5.1

Difference = mu (yield_0) - mu (yield_9)
Estimate for difference:  12.62
95% lower bound for difference:  0.39
T-Test of difference = 0 (vs >): T-Value = 2.20  P-Value = 0.046  DF = 4

Comments:
---------
The t-test gives a P-value for the one-sided test of 0.046, just below the 5%
significance limit. It is really on the border of 5% significance, which we
might express as weak evidence. Note that we do not assume the variances
to be equal because of the larger variation in the second sample (due to
the outlier).


(c)
MTB > Copy 'yield_9' c7;
SUBC>   Varnames.
MTB > name c7 'yield_9_excl'
MTB > let c7(2)='*'

MTB > Mann-Whitney 95.0 'yield_0' 'yield_9_excl';
SUBC>   Alternative 1.
Mann-Whitney Test and CI: yield_0, yield_9_excl 

              N  Median
yield_0       4  169.45
yield_9_excl  3  162.70

Point estimate for eta1-eta2 is 6.85
94.8 Percent CI for eta1-eta2 is (2.20,14.50)
W = 22.0
Test of eta1 = eta2 vs eta1 > eta2 is significant at 0.0259

MTB > TwoSample 'yield_0' 'yield_9_excl';
SUBC>   Confidence 95.0;
SUBC>   Test 0.0;
SUBC>   Alternative 1.
Two-Sample T-Test and CI: yield_0, yield_9_excl 

Two-sample T for yield_0 vs yield_9_excl

              N     Mean  StDev  SE Mean
yield_0       4   170.20   5.42      2.7
yield_9_excl  3  162.633  0.208     0.12

Difference = mu (yield_0) - mu (yield_9_excl)
Estimate for difference:  7.57
95% lower bound for difference:  1.18
T-Test of difference = 0 (vs >): T-Value = 2.79  P-Value = 0.034  DF = 3

Comments:
---------
The outlier reduced the mean yield in the 9 meter group by 5 units (bushels per
acre); the value is obtained as 162.6-157.6=5.0. It increased the standard
deviation by a factor of approximately 50 (10.1/0.208).
However, the results in both analyses with and without the outlier
are surprisingly similar, although the P-value increases in the non-parametric
analyses and decreases with the t-test (why?).

Additional comment:
-------------------
From this exercise one may get the idea that the Wilcoxon rank sum test is more
powerful than the t-test in small samples. That is not true in general, and the
P-value obtained by the Wilcoxon rank sum test is the smallest possible with
these sample sizes (why?, see below for answer).

Additional question:
--------------------
The rank sums for the two groups are the most extreme possible for a
dataset with 4 observations in each group. That is because every
observation in the weed 0 sample is larger than any of the observations
in the weed 9 sample. In this case, the ranks for the weed 9 sample
become 1, 2, 3 and 4 (sum=10), and the ranks for the weed 0 sample
become 5, 6, 7 and 8 (sum=26). No matter the actual values, the ranks in
the two groups can never be more different, and therefore the P-value is
the smallest possible with two samples of size 4.