Solution file for additional exercise 10.2 ------------------------------------------ (see additional exercise 10.1 for discussion of model, design and notation) MTB > WOpen "h:\vhm\vhm802\data_csv\hs10_2.csv"; SUBC> FType; SUBC> CSV; SUBC> DecSep; SUBC> Period; SUBC> Field; SUBC> Comma; SUBC> TDelimiter; SUBC> DoubleQuote. Retrieving worksheet from file: 'h:\vhm\vhm802\data_csv\hs10_2.csv' Worksheet was saved on 03/03/2011 MTB > GLM 'moisture' = lot; SUBC> Random 'lot'; SUBC> Brief 3 ; SUBC> EMS; SUBC> GFourpack; SUBC> RType 2 . General Linear Model: moisture versus lot Factor Type Levels Values lot random 3 1, 2, 3 Analysis of Variance for moisture, using Adjusted SS for Tests Source DF Seq SS Adj SS Adj MS F P lot 2 12.8958 12.8958 6.4479 92.80 0.002 Error 3 0.2084 0.2084 0.0695 Total 5 13.1043 S = 0.263597 R-Sq = 98.41% R-Sq(adj) = 97.35% Term Coef SE Coef T P Constant 36.8983 0.1076 342.88 0.000 lot 1 2.0417 0.1522 13.42 0.001 2 -1.3333 0.1522 -8.76 0.003 Expected Mean Squares, using Adjusted SS Expected Mean Square for Each Source Term 1 lot (2) + 2.0000 (1) 2 Error (2) ... Variance Components, using Adjusted SS Estimated Source Value lot 3.18922 Error 0.06948 Residual Plots for moisture MTB > Name c3 "ByVar1" c4 "Mean1" MTB > Statistics 'moisture'; SUBC> By 'lot'; SUBC> GValues 'ByVar1'; SUBC> Mean 'Mean1'. MTB > NormTest 'Mean1'. Probability Plot of Mean1 The P-value of the Anderson-Darling test for normality is 0.227 MTB > Oneway 'moisture' 'lot'. One-way ANOVA: moisture versus lot Source DF SS MS F P lot 2 12.8958 6.4479 92.80 0.002 Error 3 0.2085 0.0695 Total 5 13.1043 S = 0.2636 R-Sq = 98.41% R-Sq(adj) = 97.35% Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev ---------+---------+---------+---------+ 1 2 38.940 0.057 (---*----) 2 2 35.565 0.021 (----*----) 3 2 36.190 0.453 (----*----) ---------+---------+---------+---------+ 36.0 37.2 38.4 39.6 Pooled StDev = 0.264 Comments and answers to questions: ---------------------------------- The model assumes the same variability for all lots, but actually two of the lots show very small variation between the two replicates whereas the third has much larger variation. With such a small dataset, it is quite impossible to check the assumption, so it is essentially the responsibility of the analyst that it is meaningful. Also the normality assumption for the random effects can hardly be checked realistically (even though the procedure was included above). Variance component parameter estimates are given above: sigma^2 : 0.069 sigma_A^2: 3.19 Obviously, there is much larger variation between lots than within lots (and accordingly the F-statistic to test no differences between lots is clearly significant, despite the small dataset). The remaining parameter of the model is the overall mean: mu: 36.90 The standard error must be computed using the MS(Lot) instead of MSE, that is, SE(mean) = sqrt(MS(Lot)/6) = 1.04, and the corresponding 95% confidence interval is mu: 36.90 +- t(2,.975)*1.04 = 36.90 +- 4.3027*1.04 = 36.90 +- 4.46 Note that the Minitab print gives the wrong standard error (corresponding to a fixed effects model). Analysis of the 1-way ANOVA with fixed effects would give the SAME ANOVA table, but different mean parameters to estimate: one mean for each lot. Parametrizing the model as: y_ij = mu + alpha_i + eps_ij, with the (Minitab) restriction that alpha_1+alpha_2+alpha_3=0, gives the following estimate and standard error for mu: mu: 36.90, SE(mean) = sqrt(MSE/6) = 0.108. The standard error is much smaller, because it takes only into account the variation within lots, not the variation between lots. The parameter mu has an interpretation as the overall mean FOR THESE 3 LOTS, whereas in the random effects model it is the overall mean IN THE POPULATION OF LOTS, from which the 3 lots are sampled. Due to the large variation between lots, the latter is determined with much larger error than the former. Note that analyses by the xtmixed command in Stata and proc mixed in SAS give the correct estimate and SE for mu in the random effects model. The Stata analysis uses a standard normal distribution as the reference distribution for inference about mu, whereas SAS uses a t-distribution with 2 df; the latter is preferable.