Supplementary exercise 6.107 of IPS7e ------------------------------------- Let's say for simplicity of discussion that the test was for a population mean being equal to zero, i.e. H0: mu=0 (where mu is the population mean). With P=0.95, there is absolutely no evidence against H0 (at the common significance level of 0.05, or any other realistic significance level). We can say the data were not at all surprising if H0 was true. Therefore we have no reason (and certainly no justification) to question H0 from our data: it could very well be true. Now let us consider a couple of the misconceptions from the Greenland et al. (2006) paper. Regarding 4.), we should not say that the null hypothesis is true or should be accepted. That is simply because the test cannot give us evidence in favour of the null hypothesis. Yes, H0 could be true, but there are (infinitely) many other hypotheses that could be true as well. A 95% confidence interval gives us the range of values for which H0 would not be rejected. Some of those values in the range will have a smaller P-value than the mu=0 from our H0, but also for them we could say the observed data were not surprising. The bottom line is that we no proof or evidence for the null hypothesis. Regarding 5.), many of the same comments apply. With our typical two-sided or one-sided interval alternative hypothesis, there are many values among those included in the alternative for which a test of that particular mean would give a higher P-value than the 0.95 we got for mu=0. So it just wrong to think that a large P-value favours the H0 over the alternative. The situation would be different if the alternative was also a single value, because then it comes simply a choice between two values, but this is not the situation we have been considering in the course. Regarding 6.), the second part of the statement was also already discussed under 4.), and it is also wrong that no effect was observed. We can really only say that no effect was observed if the estimated mean is exactly zero. That would however correspond to a P-value of 1 (for a two-sided alternative; for example, the z- or t-statistic would be equal to zero), but we had P=0.95. So the observed mean must have been off zero, possibly slightly but still off, and therefore the observed effect was not zero. Finally regarding 8.), if we again think about z- or t-statistics, then a small value of these will occur either because the numerator (i.e., the estimate minus the hypothesized value) is small (and that would be a small effect size) or the denominator (i.e., the standard error of the estimate) is large. From looking at the P-value alone there is no way we can exclude the second explanation. A confidence interval for the population mean will give us a plausible range of values of the population mean, and that will be our best way to represent the information we have about effect size from our data.