Solution file for additional exercise 2.5: ------------------------------------------ First part of the solution examines the 1-way ANOVA assumptions by descriptive statistics for each group of origins: MTB > WOpen "R:\data_csv\hs02_5.csv"; SUBC> FType; SUBC> CSV; SUBC> DecSep; SUBC> Period; SUBC> Field; SUBC> Comma; SUBC> TDelimiter; SUBC> DoubleQuote. Retrieving worksheet from file: ‘R:\data_csv\hs02_5.csv’ Worksheet was saved on 12/01/2011 MTB > Describe 'calv1serv'; SUBC> By 'origin'; SUBC> Mean; SUBC> SEMean; SUBC> StDeviation; SUBC> QOne; SUBC> Median; SUBC> QThree; SUBC> Minimum; SUBC> Maximum; SUBC> Skewness; SUBC> Kurtosis; SUBC> N; SUBC> NMissing. Descriptive Statistics: calv1serv Variable origin N N* Mean SE Mean StDev Minimum Q1 Median calv1serv 1 13 0 79.8 10.2 37.0 43.0 53.5 69.0 2 38 0 91.76 7.58 46.71 41.00 63.00 77.00 3 73 0 76.93 3.83 32.76 33.00 53.00 70.00 Variable origin Q3 Maximum Skewness Kurtosis calv1serv 1 100.0 162.0 1.28 0.75 2 102.00 221.00 1.31 0.95 3 91.50 205.00 1.85 4.48 MTB > PPlot 'calv1serv'; SUBC> Normal; SUBC> Symbol; SUBC> FitD; SUBC> Grid 2; SUBC> Grid 1; SUBC> MGrid 1; SUBC> Panel 'origin'. Probability Plot of calv1serv Comments on descriptive statistics: ----------------------------------- Quite clearly a normal distribution does not describe these data well at all, because they are markedly right-skewed (the computed skewness in the 3 groups range between 1.3 and 1.8). The normal plots are clearly curved, with a similar shape in all three groups. The tests for normal distribution have P-values below 0.015 (and much lower for the two groups of substantial size). Therefore, the need of transformation is obvious. Note finally, that the standard deviation is largest in the group with the largest mean, something that we can hope to correct by transformation (although the major concern is not with the homoscedasticity). The easiest way to carry out a Box-Cox transformation analysis in Minitab is to use the Regression-General Regression menu, in which we must specify the predictor (origin) as categorical. MTB > GReg 'calv1serv' = origin; SUBC> Categorical 'origin'; SUBC> Constant; SUBC> Confidence 95.0; SUBC> Coding -1; SUBC> Boxcox; SUBC> TEquation; SUBC> TCoef; SUBC> TSummary; SUBC> TANOVA; SUBC> TDiag 0. General Regression Analysis: calv1serv versus origin Box-Cox transformation of the response with rounded lambda = -0.5 The 95% CI for lambda is (-1.095, -0.265) ... Comments: --------- The analysis does not give us the estimated optimal lambda-value, only the rounded value at -0.5 and the CI. We could base our evaluation of the model on inverse square-root transformed scale on the residuals obtained with the General Regression command, but we will instead use descriptive statistics in the same way as above. Note that we can do this only because of the simplicity of the 1-way ANOVA model; in more complex cases we would rely on the residuals. MTB > Let c3 = -1/sqrt('calv1serv') MTB > name c3 'invroot' MTB > Describe 'invroot'; SUBC> By 'origin'; SUBC> Mean; SUBC> SEMean; SUBC> StDeviation; SUBC> QOne; SUBC> Median; SUBC> QThree; SUBC> Minimum; SUBC> Maximum; SUBC> Skewness; SUBC> Kurtosis; SUBC> N; SUBC> NMissing. Descriptive Statistics: invroot Variable origin N N* Mean SE Mean StDev Minimum Q1 invroot 1 13 0 -0.11904 0.00647 0.02333 -0.15250 -0.13726 2 38 0 -0.11285 0.00394 0.02428 -0.15617 -0.12602 3 73 0 -0.12010 0.00246 0.02098 -0.17408 -0.13736 Variable origin Median Q3 Maximum Skewness Kurtosis invroot 1 -0.12039 -0.10139 -0.07857 0.33 -0.66 2 -0.11399 -0.09906 -0.06727 0.13 -0.74 3 -0.11952 -0.10454 -0.06984 0.17 -0.26 MTB > PPlot 'invroot'; SUBC> Normal; SUBC> Symbol; SUBC> FitD; SUBC> Grid 2; SUBC> Grid 1; SUBC> MGrid 1; SUBC> Panel 'origin'. Probability Plot of invroot Comments: --------- The transformation worked well. The three groups have almost the same standard deviations and distributions with no substantial departures from normality (and non-significant P-values of the normality tests).