Supplementary Exercise 7.145 of IPS7e ------------------------------------- (continuation of 7.143, although we use the supplied data obtained after natural log transformation; in practice one would use the original data to avoid the round-off errors.) The statistical model is the same as in Exercise 7.143, except that the assumed normal distribution (and its parameters) are on log scale instead of original scale. Minitab commands: MTB > WOpen "H:\VHM\VHM801\Datasets\Minitab\Chapter 7\ex07_145.mtw". Retrieving worksheet from file: 'H:\VHM\VHM801\Datasets\Minitab\Chapter 7\ex07_145.mtw' Worksheet was saved on 09/10/2014 MTB > Describe 'loc'; SUBC> Mean; SUBC> SEMean; SUBC> StDeviation; SUBC> QOne; SUBC> Median; SUBC> QThree; SUBC> Minimum; SUBC> Maximum; SUBC> Skewness; SUBC> Kurtosis; SUBC> N; SUBC> NMissing; SUBC> GBoxplot. Descriptive Statistics: loc Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3 Maximum loc 31 0 3.338 0.109 0.609 2.090 2.880 3.410 3.860 4.360 Variable Skewness Kurtosis loc -0.12 -0.72 Boxplot of loc MTB > GSummary 'loc'. Summary Report for loc MTB > Stem-and-Leaf 'loc'; SUBC> Trim. Stem-and-Leaf Display: loc Stem-and-leaf of loc N = 31 Leaf Unit = 0.10 1 2 0 3 2 23 3 2 4 2 7 10 2 888899 15 3 00001 15 3 (5) 3 44455 11 3 667 8 3 89 6 4 000 3 4 233 MTB > PPlot 'loc'; SUBC> Normal; SUBC> Symbol; SUBC> FitD; SUBC> Grid 2; SUBC> Grid 1; SUBC> MGrid 1. Probability Plot of loc The P-value of the Anderson-Darling test of normality is 0.327 MTB > Onet 'loc'. One-Sample T: loc Variable N Mean StDev SE Mean 95% CI loc 31 3.338 0.609 0.109 (3.114, 3.561) Answers to questions: --------------------- (a) The distribution of log-transformed oc-values is close to symmetrical (the skewness is -0.12 so the slight left-skewness is of no importance). The stemplot and histogram show a few "gaps" but the normality test does not give any evidence against a normal distribution. There does not appear to be any outlying observations. (b) The 95% CI is given above. As the data show no evidence against being normally distributed, we may consider this interval as exact. There is certainly no problem with using the t-procedure for these data. (c) The backtransformed mean is exp(3.33774)=28.2. It is lower than both the estimated mean (33.4) and the estimated median (30.2) for the original data, but it is a valid estimate of the median. If the assumed normal distribution on the log-scale is valid, this estimate is better than the median computed directly from the untransformed data. The backtransformed endpoints of the 95% CI are exp(3.11430)=22.5 and exp(3.56118)=35.2. This interval is considerably narrower than the 95% CI for the median displayed in the Summary Report window (its computation will be explained in session 8). Again, assuming the normal distribution on log-scale to be valid, this is the best interval we can get for the median.