Solution file for Problems 3.5 and 4.3 (GO) ------------------------------------------- Data: measurements of compressive strength of concrete cubes. Fifteen cubes were produced and randomly assigned to five levels of poplypropylene fiber content, with three cubes per group, and fiber content ranging from 0 to 1%, in equidistant steps of 0.25%. The data constitute 5 independent samples with continuous outcome, and the model immediately suggested is a one-way ANOVA. Problem 3.5: ------------ We run the one-way ANOVA analysis, including checks of the assumptions of normality and same standard deviations in the groups. MTB > WOpen "h:\VHM\VHM802\Data_csv\ch03pr5.csv"; SUBC> FType; SUBC> CSV; SUBC> DecSep; SUBC> Period; SUBC> Field; SUBC> Comma; SUBC> TDelimiter; SUBC> DoubleQuote. Retrieving worksheet from file: 'h:\VHM\VHM802\Data_csv\ch03pr5.csv' Worksheet was saved on 11/02/2011 MTB > Oneway 'strength' 'fiber'; SUBC> GFourpack. One-way ANOVA: strength versus fiber Source DF SS MS F P fiber 4 6.263 1.566 12.98 0.001 Error 10 1.207 0.121 Total 14 7.469 S = 0.3474 R-Sq = 83.85% R-Sq(adj) = 77.38% Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev ----+---------+---------+---------+----- 0.00 3 7.4667 0.3055 (------*-----) 0.25 3 7.5667 0.3055 (-----*-----) 0.50 3 6.8667 0.5508 (-----*-----) 0.75 3 6.7000 0.3000 (------*-----) 1.00 3 5.7667 0.1528 (-----*------) ----+---------+---------+---------+----- 5.60 6.30 7.00 7.70 Pooled StDev = 0.3474 Residual Plots for strength Comments: --------- The residual plot and the table of standard deviations do not indicate serious problems with model assumptions. The standard deviations differ somewhat between groups but that could very well be a result of the small within-group sample size only. Tests for homogeneity of variance are clearly non-significant (not shown). The ANOVA table shows a strongly significant difference between groups, and the groups account for more than 80% of the variation in the data (such a high R^2 is easier to achieve in a small dataset). The estimated means and the graphical representation of confidence intervals shows that strength tends to decrease with increasing fiber levels. The decline does not seem to entirely linear because both the first two means and the third and fourth mean are fairly similar. When the groups are defined by quantitative values (the fiber content), it is often less obvious to carry out multiple comparisons, and one would instead focus on describing the relationship with the quantitative variable. A first step would be to produce a table and/or graph with group means and 95% confidence intervals. MTB > GLM 'strength' = fiber; SUBC> Brief 2 ; SUBC> Means fiber. General Linear Model: strength versus fiber Factor Type Levels Values fiber fixed 5 0.00, 0.25, 0.50, 0.75, 1.00 Analysis of Variance for strength, using Adjusted SS for Tests Source DF Seq SS Adj SS Adj MS F P fiber 4 6.2627 6.2627 1.5657 12.98 0.001 Error 10 1.2067 1.2067 0.1207 Total 14 7.4693 S = 0.347371 R-Sq = 83.85% R-Sq(adj) = 77.38% Least Squares Means for strength fiber Mean SE Mean 0.00 7.467 0.2006 0.25 7.567 0.2006 0.50 6.867 0.2006 0.75 6.700 0.2006 1.00 5.767 0.2006 MTB > Intplot ( 'strength' ) * 'fiber'; SUBC> Intbar; SUBC> Mean. Interval Plot of strength Note that the interval plots need to modified to use the residual standard standard deviation as the basis for all CIs. This can be achieved by editing the graph, by right-clicking the interval bars and choosing the submenu Edit Interval Bar. Problem 4.3: ------------ Coefficients for polynomial contrasts (with equidistant x-values and equal group sizes) are listed in Appendix D, Table D.6. As Minitab does not allow easy manipulation of estimates, all calculations need to be done by hand. One alternative is to fit the linear, quadratic, cubic and 4th order regression models directly. This will give tests for each of the polynomial contrasts as the tests for the highest order polynomial coefficient, although these tests will be based on the reduced model instead of the full oneway model. The table of sequential sum of squares for the fourth order model also gives the sum of squares for each contrast. MTB > Regress 'strength' 1 'fiber'; SUBC> Constant; SUBC> Brief 2. Regression Analysis: strength versus fiber The regression equation is strength = 7.73 - 1.71 fiber Predictor Coef SE Coef T P Constant 7.7267 0.1758 43.96 0.000 fiber -1.7067 0.2870 -5.95 0.000 S = 0.393016 R-Sq = 73.1% R-Sq(adj) = 71.0% Analysis of Variance Source DF SS MS F P Regression 1 5.4613 5.4613 35.36 0.000 Residual Error 13 2.0080 0.1545 Total 14 7.4693 MTB > Name C4 'fiber2' MTB > Let 'fiber2' = fiber**2 MTB > Name C5 'fiber3' MTB > Let 'fiber3' = fiber**3 MTB > Name C6 'fiber4' MTB > Let 'fiber4' = fiber**4 MTB > Regress 'strength' 2 'fiber' 'fiber2'; SUBC> Constant; SUBC> Brief 2. Regression Analysis: strength versus fiber, fiber2 The regression equation is strength = 7.51 + 0.046 fiber - 1.75 fiber2 Predictor Coef SE Coef T P Constant 7.5076 0.1924 39.03 0.000 fiber 0.0457 0.9115 0.05 0.961 fiber2 -1.7524 0.8741 -2.00 0.068 S = 0.354047 R-Sq = 79.9% R-Sq(adj) = 76.5% Analysis of Variance Source DF SS MS F P Regression 2 5.9651 2.9826 23.79 0.000 Residual Error 12 1.5042 0.1253 Total 14 7.4693 Source DF Seq SS fiber 1 5.4613 fiber2 1 0.5038 Unusual Observations Obs fiber strength Fit SE Fit Residual St Resid 13 0.50 6.3000 7.0924 0.1425 -0.7924 -2.44R R denotes an observation with a large standardized residual. MTB > Regress 'strength' 3 'fiber' 'fiber2' 'fiber3'; SUBC> Constant; SUBC> Brief 2. Regression Analysis: strength versus fiber, fiber2, fiber3 The regression equation is strength = 7.50 + 0.14 fiber - 2.02 fiber2 + 0.18 fiber3 Predictor Coef SE Coef T P Constant 7.5043 0.2119 35.41 0.000 fiber 0.141 2.157 0.07 0.949 fiber2 -2.019 5.477 -0.37 0.719 fiber3 0.178 3.600 0.05 0.962 S = 0.369749 R-Sq = 79.9% R-Sq(adj) = 74.4% Analysis of Variance Source DF SS MS F P Regression 3 5.9655 1.9885 14.54 0.000 Residual Error 11 1.5039 0.1367 Total 14 7.4693 Source DF Seq SS fiber 1 5.4613 fiber2 1 0.5038 fiber3 1 0.0003 Unusual Observations Obs fiber strength Fit SE Fit Residual St Resid 13 0.50 6.3000 7.0924 0.1488 -0.7924 -2.34R R denotes an observation with a large standardized residual. MTB > Regress 'strength' 4 'fiber' 'fiber2' 'fiber3' 'fiber4'; SUBC> Constant; SUBC> Brief 2. Regression Analysis: strength versus fiber, fiber2, fiber3, fiber4 The regression equation is strength = 7.47 + 6.41 fiber - 36.4 fiber2 + 56.4 fiber3 - 28.1 fiber4 Predictor Coef SE Coef T P Constant 7.4667 0.2006 37.23 0.000 fiber 6.411 4.480 1.43 0.183 fiber2 -36.38 22.49 -1.62 0.137 fiber3 56.36 35.96 1.57 0.148 fiber4 -28.09 17.90 -1.57 0.148 S = 0.347371 R-Sq = 83.8% R-Sq(adj) = 77.4% Analysis of Variance Source DF SS MS F P Regression 4 6.2627 1.5657 12.98 0.001 Residual Error 10 1.2067 0.1207 Total 14 7.4693 Source DF Seq SS fiber 1 5.4613 fiber2 1 0.5038 fiber3 1 0.0003 fiber4 1 0.2972 Comments: --------- The table below summarizes the information that can be extracted from these analyses. term coefficient SE t P SS ------------------------------------------------------ linear -1.7067 0.2870 -5.95 0.000 5.461 quadratic -1.7524 0.8741 -2.00 0.068 0.504 cubic 0.178 3.600 0.05 0.962 0.000 quartic -28.09 17.90 -1.57 0.148 0.297 ------------------------------------------------------ From the table it is clear that the two highest order terms do not improve the model substantially, and the choice is therefore between the linear and quadratic model. The slope is negative, corresponding to the already noted strong declining trend with fiber content, and the sign of the quadratic is negative as well, corresponding to a downwards bending curve. The Fitted Line Plot menu can be used to get a graph of the estimated quadratic curve. With P so close to 0.05, the choice between the two models should involve subject matter considerations (i.e., whether it is preferable to have a linear or quadratic prediction equation). Computing and evaluating the polynomial contrasts in the oneway model (using either a calculator or another software) yields the following table: Contrast Estimate SE SS SS(%) t P(t) ------------------------------------------------------ linear -4.267 .6342 5.461 87.2 -6.73 0.000 quadratic -1.533 .7504 0.504 8.0 -2.04 0.068 cubic 0.033 .6342 0.000 0.0 0.05 0.959 quartic -2.633 1.678 0.297 4.7 -1.57 0.148 ------------------------------------------------------ The difference in t-values and P-values is due to use of different SEs and DFs, as explained above. Note that one would usually not use the Scheffe test to adjust for contrasts derived from the data when assessing polynomial contrasts. One could argue that the idea of splitting the variation between into groups into orthogonal parts based on polynomial terms is universal and hence not derived from the actual data. If there was concern about the inflated type I error from assessing 4 contrasts, a Bonferroni or Holm correction could be used.