Solution file for additional exercise 4.3 ----------------------------------------- Data: presence or absence of disease (liver cirrhosis) of patients in two cities and classified as alcoholics/non-alcoholics. Notation: y_ijk = presence (1) or absence (0) of disease, for k'th person in city i, i=1,0 (New York, Philadelphia), and in alcoholic group j, j=1,0 (alcoholic, non-alcoholic), k=1,...,n_ij. Alternative notation: y_i, with i referring to patient number 1,...,4600. Full model for grouped data: Pr(y_ijk=1)=p_ij, with no restrictions on the p_ij's all y_ijk's are independent. Logistic regression model: Pr(y_ijk=1)=p_ij logit(p_ij) = mu + alpha_i + beta_j ( or logit(p_i) = mu + alpha_city(i) + beta_alcohol(i) ) all y_ijk's are independent. Effectively, this model assumes no interaction on logistic scale between effects of city and alcoholic groups. MTB > WOpen "h:\VHM\VHM802\Data_csv\hs04_3.csv"; SUBC> FType; SUBC> CSV; SUBC> DecSep; SUBC> Period; SUBC> Field; SUBC> Comma; SUBC> TDelimiter; SUBC> DoubleQuote. Retrieving worksheet from file: 'h:\VHM\VHM802\Data_csv\hs04_3.csv' Worksheet was saved on 27/01/2011 MTB > Blogistic 'disease' = city alcoholic; SUBC> Frequency 'n'; SUBC> Factors 'city' 'alcoholic'; SUBC> Logit; SUBC> Brief 2. Binary Logistic Regression: disease versus city, alcoholic Link Function: Logit Response Information Variable Value Count disease 1 210 (Event) 0 4390 Total 4600 Frequency: n Logistic Regression Table Odds 95% CI Predictor Coef SE Coef Z P Ratio Lower Upper Constant -2.88590 0.163839 -17.61 0.000 city Philadelphia -0.681352 0.171601 -3.97 0.000 0.51 0.36 0.71 alcoholic 1 2.20349 0.160458 13.73 0.000 9.06 6.61 12.40 Log-Likelihood = -756.954 Test that all slopes are zero: G = 192.772, DF = 2, P-Value = 0.000 Goodness-of-Fit Tests Method Chi-Square DF P Pearson 0.248164 1 0.618 Deviance 0.249304 1 0.618 Hosmer-Lemeshow 0.101655 1 0.750 Table of Observed and Expected Frequencies: (See Hosmer-Lemeshow Test for the Pearson Chi-Square Statistic) Value 1 0 Group Observed Expected Observed Expected Total 1 105 103.6 3667 3668.4 3772 2 25 26.4 475 473.6 500 3 80 80.0 248 248.0 328 Measures of Association: (Between the Response Variable and Predicted Probabilities) Pairs Number Percent Summary Measures Concordant 429440 46.6 Somers' D 0.37 Discordant 85040 9.2 Goodman-Kruskal Gamma 0.67 Ties 407420 44.2 Kendall's Tau-a 0.03 Total 921900 100.0 Comments: --------- The deviance and Pearson goodness-of-fit statistics are non-significant, indicating no evidence of interaction between effects of city and alcohol-group. The Hosmer-Lemeshow test is less useful here. The z-tests for city and alcohol are both clearly significant, so we do not bother calculating the likelihood ratio statistics (by fitting the relevant submodels and computing the differences in deviance). The odds-ratio for city (Philadelphia vs New York) is 0.51, which means that the risk (odds) of being diseased is about twice as high in the patient group in NY than in Ph. It is not clear what that really means, because it may very well relate to the selection of the patients. Maybe cities should be considered as blocks, and not be given any particular interpretation. The odds-ratio for alcohol group (alcoholic vs. non-alcoholic) is 9.1, which means that the risk (odds) for disease is much higher in the alcoholic group. Maybe not surprising, with today's understanding, but these data are from before 1942.