Supplementary Exercise 11.17 of IPS7e: -------------------------------------- Minitab commands and output: --- WOpen "R:\Chapter 11\ex11_015.mtw". Regress 'GPA' 3 'IQ' 'C1' 'C5'; GNormalplot; GFits; RType 2; Constant; Brief 2. The regression equation is GPA = - 4.94 + 0.0815 IQ + 0.183 C1 + 0.142 C5 Predictor Coef SE Coef T P Constant -4.937 1.491 -3.31 0.001 IQ 0.08145 0.01367 5.96 0.000 C1 0.18308 0.06475 2.83 0.006 C5 0.14205 0.06663 2.13 0.036 S = 1.475 R-Sq = 52.5% R-Sq(adj) = 50.6% Analysis of Variance Source DF SS MS F P Regression 3 178.340 59.447 27.31 0.000 Residual Error 74 161.087 2.177 Total 77 339.427 Source DF Seq SS IQ 1 136.319 C1 1 32.129 C5 1 9.893 Unusual Observations Obs IQ GPA Fit SE Fit Residual St Resid 8 97 2.412 5.504 0.282 -3.092 -2.14R 22 109 1.760 5.140 0.555 -3.380 -2.47R 48 102 3.936 7.395 0.241 -3.459 -2.38R 51 103 0.530 6.542 0.252 -6.012 -4.14R 54 72 7.295 5.378 0.662 1.917 1.45 X 72 90 3.820 4.546 0.710 -0.726 -0.56 X R denotes an observation with a large standardized residual X denotes an observation whose X value gives it large influence. Normplot of Residuals for GPA Residuals vs Fits for GPA Comments and answers to questions: ---------------------------------- The fitted regression model is: GPAhat = -4.94 + 0.0815*IQ + 0.183*C1 + 0.142*C5, and R^2=52.5%. The standard deviation about the fitted model is s=1.475. For a student with IQ=109, C1=13 and C5=8, the predicted value is GPAhat = -4.94 + 0.0815*109 + 0.183*13 + 0.142*8 = 7.46. (b) For constant C1 and C5, the GPA increases by 0.0815 for each unit increase in IQ - the regression coefficient for IQ. A 95% confidence for this parameter is beta for IQ: 0.0815 +- t(.975,74)*0.0137 = 0.0815 +- 0.0273. using t(.975,74)=1.99. (c) The most extreme residual is for observation 51. A standardized residual of - 4.08 in a model with 74 observations can be considered seriously outlying. More advanced methods for outlier detection than covered in VHM 801 show that it is strongly significant as an outlier. It seems that the observation is for the only student of age 15. This would in itself be a good reason to consider this student to be outside the target population. Also, the GPA score is the lowest in the dataset. (d) Minitab command and output: --- Copy c2 c14 Name c14 'GPA-1' Let c14(51)='*' Regress 'GPA-1' 3 'IQ' 'C1' 'C5'; GNormalplot; GFits; RType 2; Constant; Brief 2. Regression Analysis: GPA-1 versus IQ, C1, C5 The regression equation is GPA-1 = - 4.68 + 0.0805 IQ + 0.197 C1 + 0.109 C5 77 cases used 1 cases contain missing values Predictor Coef SE Coef T P Constant -4.678 1.318 -3.55 0.001 IQ 0.08050 0.01207 6.67 0.000 C1 0.19707 0.05724 3.44 0.001 C5 0.10950 0.05923 1.85 0.069 S = 1.303 R-Sq = 57.4% R-Sq(adj) = 55.7% Analysis of Variance Source DF SS MS F P Regression 3 167.112 55.704 32.83 0.000 Residual Error 73 123.855 1.697 Total 76 290.967 Source DF Seq SS IQ 1 128.405 C1 1 32.910 C5 1 5.798 Unusual Observations Obs IQ GPA-1 Fit SE Fit Residual St Resid 8 97 2.412 5.648 0.250 -3.236 -2.53R 22 109 1.760 5.301 0.491 -3.541 -2.93R 46 97 3.647 6.371 0.212 -2.724 -2.12R 48 102 3.936 7.474 0.213 -3.538 -2.75R 54 72 7.295 5.388 0.585 1.907 1.64 X 72 90 3.820 4.450 0.627 -0.630 -0.55 X R denotes an observation with a large standardized residual X denotes an observation whose X value gives it large influence. Normplot of Residuals for GPA-1 Residuals vs Fits for GPA-1 Comments and answers to questions: ---------------------------------- The regression equation for the reduced dataset is GPAhat = -4.68 + 0.0805*IQ + 0.197*C1 + 0.109*C5 The regression coefficients for IQ and C1 are only little changed, but the coefficient for C5 has dropped from 0.142 to 0.109, and it is no longer significant in the model (P=0.069). The R^2 is 57.4%, a fair bit up from the previous analysis, and the standard deviation about the fitted model is s=1.303, about 10% down from the previous analysis. The prediction for a specific student is GPAhat = -4.68 + 0.0805*109 + 0.197*13 + 0.109*8 = 7.53, a little bit up from 7.46 of the previous model. In conclusion, the outlying observation affected the regression equation only moderately, but it changed the significance for the predictor C5. We note that the residual plot still does not look good. There is some indication of a larger variation at small values, and the distribution of residuals is quite asymmetrical (left-skewed). Further improvement of the model seems necessary before we can make precise inferences about the effects of variables.