Supplementary Exercise 1.145 of IPS7e ------------------------------------- Minitab commands and output: MTB > WOpen "H:\VHM\VHM801\Datasets\Minitab\Chapter 1\ex01_145.mtw". Retrieving worksheet from file: 'H:\VHM\VHM801\Datasets\Minitab\Chapter 1\ex01_145.mtw' Worksheet was saved on 20/09/2014 MTB > Describe 'satm' 'gpa'; SUBC> By 'sex'; SUBC> Mean; SUBC> SEMean; SUBC> StDeviation; SUBC> QOne; SUBC> Median; SUBC> QThree; SUBC> Minimum; SUBC> Maximum; SUBC> Skewness; SUBC> Kurtosis; SUBC> N; SUBC> NMissing; SUBC> GBoxplot. Descriptive Statistics: satm, gpa Variable sex N N* Mean SE Mean StDev Minimum Q1 Median Q3 Maximum satm 1 145 0 611.77 6.98 84.02 400.00 550.00 620.00 670.00 800.00 2 79 0 565.03 9.33 82.93 300.00 510.00 570.00 630.00 740.00 gpa 1 145 0 2.6077 0.0670 0.8068 0.1200 2.1350 2.7500 3.1900 4.0000 2 79 0 2.6857 0.0820 0.7288 0.3900 2.2500 2.7200 3.3300 4.0000 Variable sex Skewness Kurtosis satm 1 -0.06 -0.48 2 -0.47 0.55 gpa 1 -0.71 0.35 2 -0.59 0.31 Boxplot of satm Boxplot of gpa Comments: --------- The boxplot of SAT math scores show distributions of similar shape, but the distribution for men (sex=1) appears to be centred at a higher value than for women. The difference in means is approximately 50 units. Both distributions appear fairly symmetric, and their spread is almost the same (when measured by the standard deviation). In contrast, the two distributions of gpa values appear to be centred at approximately the same value (around 2.7), but both distributions appear a bit left-skewed with longer left than right tails, with a more pronounced left-skewness for the men. Next, we produce the probability plots. It is probably easier to look at the plots in separate panels for men and women. This is most easily achieved in Minitab by choosing Probability Plot-Single, and adding the panels under the Multiple Graphs-By Variables tab. MTB > PPlot 'satm' 'gpa'; SUBC> Normal; SUBC> Symbol; SUBC> FitD; SUBC> Grid 2; SUBC> Grid 1; SUBC> MGrid 1; SUBC> Panel 'sex'. Probability Plot of satm Probability Plot of gpa Comments: --------- The normal probability plots for satm are nicely straight, and the normality tests are clearly non-significant (P>>0.05), leading us to the conclusion that there these values can be viewed as approximately normally distributed. The boxplot for women shows one low suspected outlier, and this point is also outside the normal probability limits, but overall the distribution (with this observation included) seems well approximated by a normal distribution. The normal distributio plots for gpa look less straight because of some upwards curvature at the lower end, reflecting the left skewness. The normality test is clearly significant (P<0.005) for the men and non-significant for the women. The distributions do not appear that different, so the difference in conclusion from the normality test may be mostly a reflection of the different sample sizes (almost twice as many men as women). With a larger sample size, any deviations from a normal distribution are less likely to appear as the result of random fluctuations. In conclusion, we should be cautious in assuming normal distributions for both the gpa distributions. In response to the question asked in the exercise, ignoring outliers does not seem to substantially alter our impression of the distributions. Generally speaking it is not recommended to "ignore" outliers for the sole purpose of achieving distributions that seem closer to normal. Later in the course we will discuss how to deal with non-normal distributions in the context of statistical inference.