Supplementary Exercise 8.62 of IPS7e
-----------------------------------

A randomized trial on aspirin for treatment of stroke, involving a treatment
(n=78) and a control (n=77) group. After 6 months, each patient's success was
evaluated as favourable or not.
The statistical model is two independent binomial distributions, one for
each of the two groups.

(a)
Estimates and standard errors are given in the table below. The number
of positives and negatives exceed 10 in both samples, so the
classical (normal distribution approximation) method should work well 
for a confidence interval for the difference in proportions. For completeness, 
calculations are included for the plus four interval as well, although there 
is no pressing need to use it here.

(a)
		treatment	control            difference
-------------------------------------------------------------
proportion      63/78=0.807     43/77=0.558        0.249=D
stand.error	0.0446          0.0566             0.0721
plus four-corrected values
proportion      64/80=0.800     44/79=0.557        0.243
stand.error     0.0121          0.0254             0.0716

where the standard error in each sample is calculated as 
  sqrt(p*(1-p)/n) (plus four: corrected p and n)
and the standard error of the difference D is calculated as
  sqrt(s.e.(men)^2 + s.e.(women)^2)

(b)
An approximate 90% CI for the difference in proportions between men and women is
  0.249 +- 1.645*0.0721 = 0.249 +- 0.119 = (0.131,0.368)
The corresponding plus four approximate 95% CI is 
  0.243 +- 1.645*0.0716 = 0.249 +- 0.112 = (0.125,0.361)

The confidence interval does not include zero (far from close), so that there 
would seem to be strong evidence against the hypothesis of the same proportion of 
favourable stroke progress in the two groups; the aspirin group is doing much better.

(c)
The suggested hypotheses for the test are
  H0: p_tx=p_con, and Ha: p_tx>p_con.

The PSLS/IPS condition for use of the large-sample z-test is that both 
samples have a count of at least 5 of both cases and failures. This condition
is easily satisfied. A better condition comes from the equivalence with the
X2-test to be introduced in Session 9 (also discussed below).

A first step is to calculate the pooled estimate of p:
  p_hat = (63+43)/(78+77) = 106/155 = 0.6839

With this pooled estimate of p, the Session 9 condition for
the z-test is that expected counts of both positives and negatives in both
samples based on this estimated p are all greater than 5. For example, 
the expected count of positives for the treatment group is 0.6839*78 = 53. 
It is seen that the conditions are all met.

Next we calculate the pooled standard error and the z-statistic:
  SE_Dp = sqrt(p_hat*(1-p_hat)*(1/78+1/77)) = 0.0747
  z = D/SE_Dp = 0.249/0.0747 = 3.337 ~= 3.34

A z-value of 3.34 is very clearly significant in N(0,1). The normal distribution 
table shows the tail probability to be 0.0004, so P=0.0004. There is strong
evidence to reject the hypothesis that the proportion of favourable progress differs
between the treatment and control groups. As noted above, the aspirin group has
done much better.

---

Minitab commands and listing:

MTB > PTwo 78 63 77 43;
SUBC>   Confidence 90;
SUBC>   Test 0.0;
SUBC>   Alternative 0;
SUBC>   Pooled.
Test and CI for Two Proportions 

Sample   X   N  Sample p
1       63  78  0.807692
2       43  77  0.558442

Difference = p (1) - p (2)
Estimate for difference:  0.249251
90% CI for difference:  (0.130710, 0.367792)
Test for difference = 0 (vs ? 0):  Z = 3.34  P-Value = 0.001

MTB > PTwo 80 64 79 44;
SUBC>   Confidence 90;
SUBC>   Test 0.0;
SUBC>   Alternative 0;
SUBC>   Pooled.
Test and CI for Two Proportions 

Sample   X   N  Sample p
1       64  80  0.800000
2       44  79  0.556962

Difference = p (1) - p (2)
Estimate for difference:  0.243038
90% CI for difference:  (0.125302, 0.360774)
Test for difference = 0 (vs ? 0):  Z = 3.28  P-Value = 0.001

MTB > PTwo 78 63 77 43;
SUBC>   Confidence 95;
SUBC>   Test 0.0;
SUBC>   Alternative 1;
SUBC>   Pooled.
Test and CI for Two Proportions 

Sample   X   N  Sample p
1       63  78  0.807692
2       43  77  0.558442

Difference = p (1) - p (2)
Estimate for difference:  0.249251
95% lower bound for difference:  0.130710
Test for difference = 0 (vs > 0):  Z = 3.34  P-Value = 0.000

Notes: 
------
* The "pooled" setting only affects the calculation of the z-test, but
there is no impact on the confidence interval (which does not use a
pooled p).
* The Minitab listing also gives Fisher's exact test for testing
equal p in the two samples, but these have been omitted here. 
* If extra counts have been added for the plus four CIs, the
corresponding test statistics shown in the listing will never be valid.