Supplementary Exercise 7.143 of IPS7e ------------------------------------- Data: OC (osteocalcin) measurements (in mg/ml blood) for 31 healthy women between 11 and 32 years. Model: the 31 observations are a simple random sample (i.i.d. sample) from a distribution with mean mu and standard devation sigma, both of which are unknown parameters. Minitab commands and output: MTB > WOpen "H:\VHM\VHM801\Datasets\Minitab\Chapter 7\ex07_143.mtw". Retrieving worksheet from file: 'H:\VHM\VHM801\Datasets\Minitab\Chapter 7\ex07_143.mtw' Worksheet was saved on 07/10/2014 MTB > Describe 'oc'; SUBC> Mean; SUBC> SEMean; SUBC> StDeviation; SUBC> QOne; SUBC> Median; SUBC> QThree; SUBC> Minimum; SUBC> Maximum; SUBC> Skewness; SUBC> Kurtosis; SUBC> N; SUBC> NMissing; SUBC> GBoxplot. Descriptive Statistics: oc Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3 Maximum oc 31 0 33.42 3.52 19.61 8.10 17.90 30.20 47.70 77.90 Variable Skewness Kurtosis oc 0.84 -0.19 Boxplot of oc MTB > Stem-and-Leaf 'oc'; SUBC> Trim. Stem-and-Leaf Display: oc Stem-and-leaf of oc N = 31 Leaf Unit = 1.0 2 0 89 3 1 0 10 1 5677799 15 2 00004 15 2 (3) 3 011 13 3 568 10 4 04 8 4 7 7 5 244 4 5 6 3 6 3 6 8 2 7 2 7 67 MTB > GSummary 'oc'. Summary Report for oc MTB > PPlot 'oc'; SUBC> Normal; SUBC> Symbol; SUBC> FitD; SUBC> Grid 2; SUBC> Grid 1; SUBC> MGrid 1. Probability Plot of oc The P-value of the Anderson-Darling test of normality is 0.009 MTB > Onet 'oc''; SUBC> Confidence 95.0; SUBC> Alternative 0. One-Sample T: oc Variable N Mean StDev SE Mean 95% CI oc 31 33.42 19.61 3.52 (26.22, 40.61) Answers to questions: --------------------- (a) The stemplot, histogram and boxplot all show that the data are right-skewed and not approximated well by a normal distribution. The test (Anderson-Darling) for normality is clearly significant. The skewness is 0.84. The distribution appears to be unimodal. There are no obvious outliers, and the 1.5*IQR rule does not flag any observations as suspected outliers. Beware that for skewed data this rule often flags some observations in the heavier tail (the right tail for a right-skewed distribution), even if these are not outlying at all. This is because the rule is based on a symmetrical distribution. (b) The confidence interval is given above. It is APPROXIMATE only, because the observations themselves are clearly not normally distributed (but the sample mean is nevertheless approximately normally distributed, by the CLT). It is not clear whether this is a situation where the guidelines (7L-3) for use of the t-procedure are met: the skewness is quite pronounced, and the number of observations is less than 40.