Supplementary Exercises 1.42 and 1.72 of IPS7e ---------------------------------------------- 1.42: ----- For this exercise, we will work with the stemplot provided in the problem. We will use the actual data for the follow-up questions in 1.72. (a) Because the stems are whole percents and the leaves are tenths of a percent, we can approximate the values to within one decimal by reading off the numbers from the stemplot. A stem value of 17 and a leaf value of 6 corresponds to 17.6% (Florida). In the same way, the approximate value for Alaska equals 5.7%. (b) The distribution appears as unimodal and left-skewed, centered at 13% and with a fairly narrow spread: almost all the states have values between 11% and 15%. 1.72: ----- We prefer to use statistical software to compute numerical summaries for the distribution, despite that the problem suggests the 5-number summary to be computed from the stemplot. MTB > WOpen "H:\VHM\VHM801\Datasets\Minitab\Chapter 1\ex01_042.mtw". Retrieving worksheet from file: ‘H:\VHM\VHM801\Datasets\Minitab\Chapter 1\ex01_042.mtw’ Worksheet was saved on 28/08/2014 MTB > Describe 'over65'. Descriptive Statistics: over65 Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3 Maximum over65 50 0 12.538 0.269 1.905 5.700 11.675 12.750 13.500 17.600 Comments: --------- The 5-number summary is 5.7 - 11.7 - 12.8 - 13.5 - 17.6. For the 1.5*IQR rule we further calculate: IQR = 13.5-11.7 = 1.8 Q1 - 1.5*IQR = 11.7-2.7 = 9.0 Q3 + 1.5*IQR = 13.5+2.7 = 16.2 Both of the two extreme states (Alaska and Florida) are beyond this range, so they would indeed be identified as suspected outliers. Another state with a value around 8.5% would however also be identified as a suspected outlier. This may seem unnatural because that value does not differ that much from several values between 9% and 10%. In a left- skewed distribution, the rule will tend to identify too many outliers in the left tail, with the converse happening in a right-skewed distribution. This is because the rule is symmetrical and does therefore not account for one tail being naturally larger than the other one. In practice we would always have to use our judgement to decide whether suspected outliers should be regarded as real outliers.