Extra Exercise 21 ----------------- (continuation of Supplementary Exercises 10.33 and 10.7) Consider the CRP and retinol data from Supplementary Exercise 10.33. In that exercise we used a linear regression to predict retinol from CRP. We noted however that both variables were responses, so it would be meaningful to compute a correlation coefficient. From the descriptive analysis for the two variables it was clear however that assuming normal distributions is quite unreasonable. Therefore, the nonparametric rank correlation (Spearman's) is suggested. Minitab commands and output for Spearman's rank correlation, both by using the Stat-Basic-Correlation menu and by manually computing the ranks: MTB > WOpen "R:\Chapter 10\ex10_033.mtw". Retrieving worksheet from file: ‘R:\Chapter 10\ex10_033.mtw’ Worksheet was saved on 07/11/2014 MTB > Correlation 'retinol' 'crp'. Correlation: retinol, crp Pearson correlation of retinol and crp = -0.327 P-Value = 0.039 MTB > Correlation 'retinol' 'crp'; SUBC> Spearman. Spearman Rho: retinol, crp Spearman rho for retinol and crp = -0.348 P-Value = 0.028 MTB > name c3 'rankret' MTB > Rank 'retinol' 'rankret'. MTB > name c4 'rankcrp' MTB > Rank 'crp' 'rankcrp'. MTB > Correlation 'rankret' 'rankcrp'. Correlations: rankret, rankcrp Pearson correlation of rankret and rankcrp = -0.348 P-Value = 0.028 MTB > Correlation 'retinol' 'crp'. Correlations: retinol, crp Pearson correlation of retinol and crp = -0.327 P-Value = 0.039 Comments: --------- The Spearman rank correlation is -0.348, and with n>30 we can assess its significance by the usual t-test for which Minitab gives P=0.028. This is exactly what the Minitab menu does (regardless of n, which is a poor approximation for small n). By contrast, the Pearson correlation is -0.327 with P=0.039. It is seen that the two estimates are pretty similar, despite the concerns about violations of the model assumptions noted in Supplementary Exercise 10.33. ------ Consider now the golf data from Supplementary Exercises 2.2 and 10.7. Minitab commands and output: MTB > WOpen "R:\Chapter 2\ex02_002.mtw". Retrieving worksheet from file: ‘R:\Chapter 2\ex02_002.mtw’ Worksheet was saved on 15/11/2014 MTB > Correlation 'round1' 'round2'. Correlation: round1, round2 Pearson correlation of round1 and round2 = 0.687 P-Value = 0.014 MTB > Correlation 'round1' 'round2'; SUBC> Spearman. Spearman Rho: round1, round2 Spearman rho for round1 and round2 = 0.669 P-Value = 0.017 MTB > name c4 'rank1' MTB > Rank 'round1' 'rank1'. MTB > name c5 'rank2' MTB > Rank 'round2' 'rank2'. MTB > Correlation 'rank1' 'rank2'; SUBC> NoPValues. Correlations: rank1, rank2 Pearson correlation of rank1 and rank2 = 0.669 Comments: --------- The Pearson and Spearman correlations are quite similar. The P-value for testing rho=0 based on the Spearman correlation should use the table with critical values mentioned in the text (and *not* the Minitab P-value). The observed r=0.67 for n=12 corresponds to a P-value between 0.02 and 0.05, somewhat higher than for the Pearson correlation. This test is less powerful than the t-test for the Pearson correlation, but it also has less assumptions. We explore the impact of the two extreme observations (Player 7 and 8) by giving the Pearson and Spearman correlations for different subsets of the data: dataset Pearson corr Spearman corr ----------------------------------------------------- full 0.687 0.669 without player 8 0.842 0.715 without player 7 0.550 0.606 without players 7&8 0.661 0.620 It is seen that Spearman correlation is indeed less sensitive to the omission of extreme observations.