Exercise 28.14 of PSLS 3e: -------------------------- Data on taste and chemical composition of cheese, 30 cheddar cheese samples were analyzed. The outcome of interest is Taste, a score obtained by combining several tasters. The predictors of interest are Acetic, H2S and Lactic; all three were measured as responses in the study, but interest is in predicting the taste as a function of these, they are therefore considered fixed. (a) Minitab commands and output: --- WOpen "R:\Chapter 13\ex13_015.mtw". Correlation 'Taste'-'Lactic'. Taste Acetic H2S Acetic 0.550 0.002 H2S 0.756 0.618 0.000 0.000 Lactic 0.704 0.604 0.645 0.000 0.000 0.000 Cell Contents: Pearson correlation P-Value Plot 'Taste'*'Acetic' 'Taste'*'H2S' 'Taste'*'Lactic' 'Acetic'*'H2S' 'Acetic'*'Lactic' 'H2S'*'Lactic'; Symbol; ScFrame; ScAnnotation. Comments and answers to questions: ---------------------------------- All scatterplots show quite noisy patterns with a positive association between the pair of variables. Due to the noise it is difficult to assess whether the patterns are linear, but they are not clearly non-linear. The correlations are shown above and range from 0.550 to 0.756. All correlations are significantly different from zero (P=0.002 or less). (b) The statistical model is Taste_i = beta0 + beta1*Acetic_i + beta2*H2S_i + beta4*Lactic_i + eps_i, where the errors eps_i are i.i.d. and ~ N(0,sigma). Minitab commands and output: --- Regress 'Taste' 3 'Acetic' 'H2S' 'Lactic'; GNormalplot; GFits; RType 2; Constant; Brief 2. The regression equation is Taste = - 28.9 + 0.33 Acetic + 3.91 H2S + 19.7 Lactic Predictor Coef SE Coef T P Constant -28.88 19.74 -1.46 0.155 Acetic 0.328 4.460 0.07 0.942 H2S 3.912 1.248 3.13 0.004 Lactic 19.671 8.629 2.28 0.031 S = 10.13 R-Sq = 65.2% R-Sq(adj) = 61.2% Analysis of Variance Source DF SS MS F P Regression 3 4994.5 1664.8 16.22 0.000 Residual Error 26 2668.4 102.6 Total 29 7662.9 Source DF Seq SS Acetic 1 2314.1 H2S 1 2147.0 Lactic 1 533.3 Unusual Observations Obs Acetic Taste Fit SE Fit Residual St Resid 15 6.15 54.90 29.45 3.04 25.45 2.63R R denotes an observation with a large standardized residual Comments and answers to questions: ---------------------------------- The estimated equation is EstimTaste = - 28.9 + 0.33*Acetic + 3.91*H2S + 19.7*Lactic, with an estimated s=10.13 about the line. All regression coefficients are positive, meaning that each predictor (when the others are accounted for) is positively associated with the taste. However, the regression coefficient for Acetic is absolutely non-significant: t=0.07, P=0.94. Therefore, Acetic adds essentially nothing to a model where H2S and Lactic are already present. The coefficient for Acetic is also numerically the by far smallest, but coefficients cannot be compared directly without also taking into account the scale of the predictor. One possibility is to look at the impact of a change of one standard deviation of the predictor, but we don't go into details here. The R2 of the model is 65.2%, indicating a pretty good predictive ability of the model. (c) The non-significance of Acetic was already mentioned above. We proceed with a model without this predictor. The resulting statistical model is Taste_i = beta0 + beta1*H2S_i + beta2*Lactic_i + eps_i, where the errors eps_i are i.i.d. and ~ N(0,sigma). Minitab commands and output: --- Regress 'Taste' 2 'H2S' 'Lactic'; GNormalplot; GFits; RType 2; Constant; Brief 2. The regression equation is Taste = - 27.6 + 3.95 H2S + 19.9 Lactic Predictor Coef SE Coef T P Constant -27.592 8.982 -3.07 0.005 H2S 3.946 1.136 3.47 0.002 Lactic 19.887 7.959 2.50 0.019 S = 9.942 R-Sq = 65.2% R-Sq(adj) = 62.6% Analysis of Variance Source DF SS MS F P Regression 2 4993.9 2497.0 25.26 0.000 Residual Error 27 2669.0 98.9 Total 29 7662.9 Source DF Seq SS H2S 1 4376.7 Lactic 1 617.2 Unusual Observations Obs H2S Taste Fit SE Fit Residual St Resid 15 6.8 54.90 29.28 1.95 25.62 2.63R R denotes an observation with a large standardized residual Comments and answers to questions: ---------------------------------- The estimated equation is EstimTaste = -27.6 + 3.95*H2S + 19.9*Lactic, with an estimated s=9.94 about the line. Both regression coefficients are significant (P=0.002 and P=0.019, respectively). Their values are not much changed from the previous model, which is not too surprising considering that Acetic had so little impact in that model. The standard errors are somewhat smaller, reflecting a reduction in s. The R2 has dropped so little that it does not show on the first decimal: R2=65.2%. (d) Both predictors were significant in the model of (c), but for the request to drop the least significant we take out Lactic and arrive at the statistical model Taste_i = beta0 + beta1*H2S_i + eps_i, where the errors eps_i are i.i.d. and ~ N(0,sigma). Minitab commands and output: --- Regress 'Taste' 1 'H2S'; GNormalplot; GFits; RType 2; Constant; Brief 2. The regression equation is Taste = - 9.79 + 5.78 H2S Predictor Coef SE Coef T P Constant -9.787 5.958 -1.64 0.112 H2S 5.7761 0.9458 6.11 0.000 S = 10.83 R-Sq = 57.1% R-Sq(adj) = 55.6% Analysis of Variance Source DF SS MS F P Regression 1 4376.7 4376.7 37.29 0.000 Residual Error 28 3286.1 117.4 Total 29 7662.9 Unusual Observations Obs H2S Taste Fit SE Fit Residual St Resid 12 7.9 57.20 35.89 2.71 21.31 2.03R 15 6.8 54.90 29.21 2.12 25.69 2.42R R denotes an observation with a large standardized residual Comments and answers to questions: ---------------------------------- The estimated equation, EstimTaste = -9.79 + 5.78*H2S is clearly significant, but s has gone up to 10.83 and R2 dropped to 57.1%. The coefficient for H2S has increased to 5.78, and its standard error has actually gone down. This reflects a collinearity between H2S and Lactic in the previous model, but that model still explained a fair bit more of the variation and seemed preferable to describe the variation in the taste. (e) Model checks should preferably be done in the full model, i.e. with all three predictors. The residuals of the model look reasonably good, perhaps with some indication of increasing variance with larger fitted values. The normal plot is a little curved but nothing to worry about. There is one standardized residual outside (-2,2), for obs. no. 15 (see the table of Unusual observations in part (b)), and the value of 2.63 should prompt investigation of whether there was something strange about this observation. It is however not extreme enough to give evidence of being an outlier and require the observation to be dropped. Also, nothing indicates this observation to be particularly influential. The preferred model is the one with two predictors: H2S and Lactic. We conclude that there is evidence for both of these predictors to contribute significantly to the variation in taste, but that Acetic on the other hand does not contribute substantially once the two other predictors are included.