Solution file for additional exercise 2.5: ------------------------------------------ First part of the solution examines the 1-way ANOVA assumptions by descriptive statistics for each group of origins: MTB > WOpen "H:\VHM\VHM802\Data_csv\hs02_5.csv"; SUBC> FType; SUBC> CSV; SUBC> DecSep; SUBC> Period; SUBC> Field; SUBC> Comma; SUBC> TDelimiter; SUBC> DoubleQuote. Retrieving worksheet from file: ‘H:\VHM\VHM802\Data_csv\hs02_5.csv’ Worksheet was saved on 12/01/2011 Results for: hs02_5.csv MTB > Describe 'calv1serv'; SUBC> By 'origin'; SUBC> Mean; SUBC> SEMean; SUBC> StDeviation; SUBC> QOne; SUBC> Median; SUBC> QThree; SUBC> Minimum; SUBC> Maximum; SUBC> Skewness; SUBC> Kurtosis; SUBC> N; SUBC> NMissing. Descriptive Statistics: calv1serv Variable origin N N* Mean SE Mean StDev Minimum Q1 Median Q3 Maximum calv1serv 1 13 0 79.8 10.2 37.0 43.0 53.5 69.0 100.0 162.0 2 38 0 91.76 7.58 46.71 41.00 63.00 77.00 102.00 221.00 3 73 0 76.93 3.83 32.76 33.00 53.00 70.00 91.50 205.00 Variable origin Skewness Kurtosis calv1serv 1 1.28 0.75 2 1.31 0.95 3 1.85 4.48 MTB > PPlot 'calv1serv'; SUBC> Normal; SUBC> Symbol; SUBC> FitD; SUBC> Grid 2; SUBC> Grid 1; SUBC> MGrid 1; SUBC> Panel 'origin'. Probability Plot of calv1serv Comments on descriptive statistics: ----------------------------------- Quite clearly a normal distribution does not describe these data well at all, because they are markedly right-skewed (the computed skewness in the 3 groups range between 1.3 and 1.8). The normal plots are clearly curved, with a similar shape in all three groups. The tests for normal distribution have P-values below 0.019 (and much lower for the two groups of substantial size). Therefore, the need of transformation is obvious. Note finally, that the standard deviation is largest in the group with the largest mean, something that we can hope to correct by transformation (although the major concern is not with the homoscedasticity). The Box-Cox transformation analysis can be carried out either under Regression in the Regression-Fit Regression Model menu or under ANOVA in the General Linear Model menu; we show the results relating to the transformation of the former: MTB > Regress; SUBC> Response 'calv1serv'; SUBC> Nodefault; SUBC> Categorical 'origin'; SUBC> Terms origin; SUBC> Constant; SUBC> Boxcox; SUBC> Tmethod; SUBC> Tanova; SUBC> Tsummary; SUBC> Tcoefficients; SUBC> Tequation; SUBC> TDiagnostics 0. Regression Analysis: calv1serv versus origin Method Categorical predictor coding (1, 0) Box-Cox transformation Rounded lambda -0.5 Estimated lambda -0.670068 95% CI for lambda (-1.09157, -0.261568) ... Comments: --------- The analysis gives the estimated optimal lambda value, with a 95% CI. In addition, it suggests a "rounded" value of -0.5 and carries out the analysis for this lambda. The estimate for lambda agrees with the value produced by Stata, but the CIs differ slightly. In practice, they are probably both ok to use if the intervals appear sensible. For our data, we could base our evaluation of the model on inverse square-root transformed scale on the residuals obtained with the General Regression command, but we will instead use descriptive statistics in the same way as above. Note that we can do this only because of the simplicity of the 1-way ANOVA model; in more complex cases we would rely on the residuals. MTB > Name C3 'invroot' MTB > Let 'invroot' = -1/sqrt('calv1serv') MTB > Describe 'invroot'; SUBC> By 'origin'; SUBC> Mean; SUBC> SEMean; SUBC> StDeviation; SUBC> QOne; SUBC> Median; SUBC> QThree; SUBC> Minimum; SUBC> Maximum; SUBC> Skewness; SUBC> Kurtosis; SUBC> N; SUBC> NMissing. Descriptive Statistics: invroot Variable origin N N* Mean SE Mean StDev Minimum Q1 Median invroot 1 13 0 -0.11904 0.00647 0.02333 -0.15250 -0.13726 -0.12039 2 38 0 -0.11285 0.00394 0.02428 -0.15617 -0.12602 -0.11399 3 73 0 -0.12010 0.00246 0.02098 -0.17408 -0.13736 -0.11952 Variable origin Q3 Maximum Skewness Kurtosis invroot 1 -0.10139 -0.07857 0.33 -0.66 2 -0.09906 -0.06727 0.13 -0.74 3 -0.10454 -0.06984 0.17 -0.26 MTB > PPlot 'invroot'; SUBC> Normal; SUBC> Symbol; SUBC> FitD; SUBC> Grid 2; SUBC> Grid 1; SUBC> MGrid 1; SUBC> Panel 'origin'. Probability Plot of invroot Comments: --------- The transformation worked well. The three groups have almost the same standard deviations and distributions with no substantial departures from normality (and non-significant P-values of the normality tests).