Solution file for Additional exercise 7.2 ----------------------------------------- Data: measurements of blood glucose levels of 3 rabbits after injection with 3 different doses of insulin and done on 3 different days. Notation: y_i = blood glucose conc. (mg per 100 ml blood) for i'th sample, i=1,...,9, or y_ijk = blood glucose conc. (mg per 100 ml blood) for rabbit j, after having received treatment i, at day k, i=A,B,C (0,1,2 doses); j=1,2,3; k=1,2,3. The design is a 3x3 Latin square. As measurements with different treatments are obtained from the same rabbits, the order of the treatments is an issue and is taken as a blocking factor in the design. This is a safeguard against any spurious treatment effects caused by day effects, but does not really take into account carry-over effects very well (because only certain combinations of treatments accur after each other). It is not possible to do better in such a small design. The statistical model is y_i = mu + alpha_trt(i) + beta_rabbit(i) + gamma_day(i) + eps_i, or y_ijk = mu + alpha_i + beta_j + gamma_k + epsilon_ijk, depending on the chosen notation. MTB > WOpen "H:\VHM\VHM802\Data_csv\hs07_2.csv"; SUBC> FType; SUBC> CSV; SUBC> DecSep; SUBC> Period; SUBC> Field; SUBC> Comma; SUBC> TDelimiter; SUBC> DoubleQuote. Retrieving worksheet from file: ‘H:\VHM\VHM802\Data_csv\hs07_2.csv’ Worksheet was saved on 17/02/2011 MTB > GLM; SUBC> Response 'glucose'; SUBC> Nodefault; SUBC> Categorical 'day' 'rabbit' 'insulin'; SUBC> Terms day rabbit insulin; SUBC> Means day rabbit insulin; SUBC> TExpand; SUBC> TMethod; SUBC> TAnova; SUBC> TSummary; SUBC> TCoefficients; SUBC> TEquation; SUBC> TFactor; SUBC> TMeans; SUBC> TDiagnostics 0; SUBC> Rtype 2; SUBC> GFOURPACK. General Linear Model: glucose versus day, rabbit, insulin Method Factor coding (-1, 0, +1) Factor Information Factor Type Levels Values day Fixed 3 1, 2, 3 rabbit Fixed 3 1, 2, 3 insulin Fixed 3 A, B, C Analysis of Variance Source DF Seq SS Contribution Adj SS Adj MS F-Value P-Value day 2 92.67 14.90% 92.67 46.333 5.56 0.152 rabbit 2 96.00 15.43% 96.00 48.000 5.76 0.148 insulin 2 416.67 66.99% 416.67 208.333 25.00 0.038 Error 2 16.67 2.68% 16.67 8.333 Total 8 622.00 100.00% Model Summary S R-sq R-sq(adj) PRESS R-sq(pred) 2.88675 97.32% 89.28% 337.5 45.74% Coefficients Term Coef SE Coef 95% CI T-Value P-Value VIF Constant 46.000 0.962 (41.860, 50.140) 47.80 0.000 day 1 -4.33 1.36 (-10.19, 1.52) -3.18 0.086 1.33 2 1.00 1.36 ( -4.86, 6.86) 0.73 0.539 1.33 rabbit 1 0.00 1.36 ( -5.86, 5.86) 0.00 1.000 1.33 2 4.00 1.36 ( -1.86, 9.86) 2.94 0.099 1.33 insulin A 8.33 1.36 ( 2.48, 14.19) 6.12 0.026 1.33 B -0.00 1.36 ( -5.86, 5.86) -0.00 1.000 1.33 Regression Equation glucose = 46.000 - 4.33 day_1 + 1.00 day_2 + 3.33 day_3 + 0.00 rabbit_1 + 4.00 rabbit_2 - 4.00 rabbit_3 + 8.33 insulin_A - 0.00 insulin_B - 8.33 insulin_C Means Fitted Term Mean SE Mean day 1 41.67 1.67 2 47.00 1.67 3 49.33 1.67 rabbit 1 46.00 1.67 2 50.00 1.67 3 42.00 1.67 insulin A 54.33 1.67 B 46.00 1.67 C 37.67 1.67 Residual Plots for glucose Comments: --------- The analysis shows a significant effect of treatments and non-significant effects of rabbits and days. Note that the power of the F-tests is rather low with only 2 degrees of freedom for error. Therefore, the significant effect is quite satisfactory, despite a P-value of only 0.038. The treatment means indicate that doses of insulin decrease the blood glucose level. One question is whether to refit the model without the non-significant factors. Usually this is not of interest because the partial and sequential sum of squares coincide (so that results would be unchanged). However, with only 2 degrees of freedom for error, one potential and important advantage could be to increase the degrees of freedom for error (pooling). In this situation, I would not do that because the F-statistics are numerically large and pooling will therefore increase our estimate of model variance considerably. We carry out a test of linearity ("lack of fit") for number of doses in the full Latin square model. MTB > Code ("A") 1 ("B") 2 ("C") 3 'insulin'; SUBC> TSummary; SUBC> After. Code Summary Original Recoded Number Value Value of Rows A 1 3 B 2 3 C 3 3 Source data column insulin Recoded data column Coded insulin MTB > GLM; SUBC> Response 'glucose'; SUBC> Nodefault; SUBC> Continuous 'Coded insulin'; SUBC> Categorical 'day' 'rabbit'; SUBC> Unstandardized; SUBC> Terms C5 day rabbit; SUBC> Means day rabbit; SUBC> TExpand; SUBC> TMethod; SUBC> TAnova; SUBC> TSummary; SUBC> TCoefficients; SUBC> TEquation; SUBC> TFactor; SUBC> TMeans; SUBC> TDiagnostics 0; SUBC> Rtype 2; SUBC> GFOURPACK. General Linear Model: glucose versus Coded insulin, day, rabbit Factor Information Factor Type Levels Values day Fixed 3 1, 2, 3 rabbit Fixed 3 1, 2, 3 Analysis of Variance Source DF Seq SS Contribution Adj SS Adj MS F-Value P-Value Coded insulin 1 416.67 66.99% 416.67 416.667 75.00 0.003 day 2 92.67 14.90% 92.67 46.333 8.34 0.060 rabbit 2 96.00 15.43% 96.00 48.000 8.64 0.057 Error 3 16.67 2.68% 16.67 5.556 Total 8 622.00 100.00% S R-sq R-sq(adj) PRESS R-sq(pred) 2.35702 97.32% 92.85% 172.125 72.33% Coefficients Term Coef SE Coef 95% CI T-Value P-Value VIF Constant 62.67 2.08 ( 56.05, 69.28) 30.15 0.000 Coded insulin -8.333 0.962 (-11.396, -5.271) -8.66 0.003 1.00 day 1 -4.33 1.11 ( -7.87, -0.80) -3.90 0.030 1.33 2 1.00 1.11 ( -2.54, 4.54) 0.90 0.434 1.33 rabbit 1 -0.00 1.11 ( -3.54, 3.54) -0.00 1.000 1.33 2 4.00 1.11 ( 0.46, 7.54) 3.60 0.037 1.33 Regression Equation day rabbit 1 1 glucose = 58.33 - 8.333 Coded insulin 1 2 glucose = 62.33 - 8.333 Coded insulin 1 3 glucose = 54.33 - 8.333 Coded insulin 2 1 glucose = 63.67 - 8.333 Coded insulin 2 2 glucose = 67.67 - 8.333 Coded insulin 2 3 glucose = 59.67 - 8.333 Coded insulin 3 1 glucose = 66.00 - 8.333 Coded insulin 3 2 glucose = 70.00 - 8.333 Coded insulin 3 3 glucose = 62.00 - 8.333 Coded insulin Means Fitted Term Mean SE Mean day 1 41.67 1.36 2 47.00 1.36 3 49.33 1.36 rabbit 1 46.00 1.36 2 50.00 1.36 3 42.00 1.36 Covariate Data Mean StDev Coded insulin 2.000 0.866 Residual Plots for glucose Comments: --------- The (numerator of the) F-test statistic would now be obtained by subtracting SSEs and DFEs from the two models as usual, however in this case the SSEs are equal! This is because the 3 treatment means are exactly on a straight line. Obviously, the linear model in this case is well supported by the data. A 95% confidence interval for the impact of one added insulin dose on the glucose level is given above, and could be computed manually as -8.33 +- 3.1824*0.9623 = -8.33 +- 3.06 = (-11.4,-5.3).