Solution file for additional exercise 10.2 ------------------------------------------ (see additional exercise 10.1 for discussion of model, design and notation) MTB > WOpen "H:\VHM\VHM802\Data_csv\hs10_2.csv"; SUBC> FType; SUBC> CSV; SUBC> DecSep; SUBC> Period; SUBC> Field; SUBC> Comma; SUBC> TDelimiter; SUBC> DoubleQuote. Retrieving worksheet from file: ‘H:\VHM\VHM802\Data_csv\hs10_2.csv’ Worksheet was saved on 03/03/2011 MTB > GLM; SUBC> Response 'moisture'; SUBC> Nodefault; SUBC> Categorical 'lot'; SUBC> Random lot; SUBC> Terms lot; SUBC> TExpand; SUBC> TMethod; SUBC> TAnova; SUBC> TSummary; SUBC> TCoefficients; SUBC> TEquation; SUBC> TFactor; SUBC> TEMS; SUBC> TVariance; SUBC> TDiagnostics 0; SUBC> Rtype 2; SUBC> GFOURPACK. General Linear Model: moisture versus lot Method Factor coding (-1, 0, +1) Factor Information Factor Type Levels Values lot Random 3 1, 2, 3 Analysis of Variance Source DF Seq SS Contribution Adj SS Adj MS F-Value P-Value lot 2 12.8958 98.41% 12.8958 6.44792 92.80 0.002 Error 3 0.2084 1.59% 0.2084 0.06948 Total 5 13.1043 100.00% Model Summary S R-sq R-sq(adj) PRESS R-sq(pred) 0.263597 98.41% 97.35% 0.8338 93.64% Coefficients Term Coef SE Coef 95% CI T-Value P-Value Constant 36.898 0.108 (36.556, 37.241) 342.88 0.000 lot 1 2.042 0.152 ( 1.557, 2.526) 13.42 0.001 2 -1.333 0.152 (-1.818, -0.849) -8.76 0.003 Regression Equation moisture = 36.898 + 2.042 lot_1 - 1.333 lot_2 - 0.708 lot_3 Equation treats random terms as though they are fixed. Expected Mean Squares, using Adjusted SS Expected Mean Square Source for Each Term 1 lot (2) + 2.0000 (1) 2 Error (2) ... Variance Components, using Adjusted SS Source Variance % of Total StDev % of Total lot 3.18922 97.87% 1.78584 98.93% Error 0.0694833 2.13% 0.26360 14.60% Total 3.2587 1.80519 Residual Plots for moisture MTB > Name c3 "ByVar1" c4 "Mean1" MTB > Statistics 'moisture'; SUBC> By 'lot'; SUBC> GValues 'ByVar1'; SUBC> Mean 'Mean1'. MTB > PPlot 'Mean1'; SUBC> Normal; SUBC> Symbol; SUBC> FitD; SUBC> Grid 2; SUBC> Grid 1; SUBC> MGrid 1. Probability Plot of Mean1 The P-value of the Anderson-Darling test of normality is 0.227. MTB > Describe 'moisture'; SUBC> By 'lot'; SUBC> Mean; SUBC> StDeviation; SUBC> Minimum; SUBC> Maximum; SUBC> N. Descriptive Statistics: moisture Variable lot N Mean StDev Minimum Maximum moisture 1 2 38.940 0.0566 38.900 38.980 2 2 35.565 0.0212 35.550 35.580 3 2 36.190 0.453 35.870 36.510 Comments and answers to questions: ---------------------------------- The model assumes the same variability for all lots, but actually two of the lots show very small variation between the two replicates whereas the third has much larger variation. With such a small dataset, it is quite impossible to check the assumption, so it is essentially the responsibility of the analyst that it is meaningful. Also the normality assumption for the random effects can hardly be checked realistically (even though the procedure was included above). Variance component parameter estimates are given above: sigma^2 : 0.069 sigma_A^2: 3.19 Obviously, there is much larger variation between lots than within lots (and accordingly the F-statistic to test no differences between lots is clearly significant, despite the small dataset). The remaining parameter of the model is the overall mean: mu: 36.90 The standard error must be computed using the MS(Lot) instead of MSE, that is, SE(mean) = sqrt(MS(Lot)/6) = 1.04, and the corresponding 95% confidence interval is mu: 36.90 +- t(2,.975)*1.04 = 36.90 +- 4.3027*1.04 = 36.90 +- 4.46. Analysis of the 1-way ANOVA with fixed effects would give the SAME ANOVA table, but different mean parameters to estimate: one mean for each lot. Parametrizing the model as: y_ij = mu + alpha_i + eps_ij, with the (Minitab) restriction that alpha_1+alpha_2+alpha_3=0, gives the following estimate and standard error for mu (in the Minitab listing above): mu: 36.90, SE(mean) = sqrt(MSE/6) = 0.108. The standard error is much smaller, because it takes only into account the variation within lots, not the variation between lots. The parameter mu has an interpretation as the overall mean FOR THESE 3 LOTS, whereas in the random effects model it is the overall mean IN THE POPULATION OF LOTS, from which the 3 lots are sampled. Due to the large variation between lots, the latter is determined with much larger error than the former. Note that the Minitab print gave the wrong standard error and confidence omterval (corresponding to the fixed effects model); there is no way to get the correct values in the software. Analyses by the mixed command in Stata, proc mixed in SAS, as well as the nlme and lme4 libraries in R give the correct estimate and SE for mu in the random effects model. The Stata analysis uses a standard normal distribution as the reference distribution for inference about mu, whereas all the other methods use a more exact t-distribution with 2 df or 3 df.