Extra exercise 2 ---------------- Minitab commands: WOpen "R:\Chapter 1\ex01_051.mtw". Histogram 'days'; Bar. (a) The default histogram has 10 bins, each of width 60 and centered at the values 60, 120, ..., 600. One of the bins has no observations (but it still counts as a bin!). The distribution is clearly right-skewed, as discussed in Exercise 1.51. (b) The smallest number of bins allowed in Minitab is 2. The corresponding histogram is misleading because the first bin is centered around 0 (and hence gives the impression that the distribution includes negative values). It would be better to use cutpoints instead of midpoints (in Minitab terminology), but in any case one would of course want more than 2 bins. The largest number of bins allowed in Minitab is 1000 (according to the help information displayed). There are only 72 observations in the data, and some of these take the same value (we call such values ties), so with very many intervals most of the bins will be empty. With 555 intervals the interval length is 1.0 (because the range is 555). Nevertheless, with so many bins the histogram is very noisy and does not give a good impression of the distribution. (c) The Minitab default histogram looks quite good, and changing the number of bins a bit upward or downward does not change the shape much. With 20 bins the histogram is perhaps getting too noisy. On other hand, the lecture guidelines correspond to sqrt(72) ~= 8.5 1+log2(72) = 1+ln(72)/ln(2) ~= 7.2 and histograms with so few bins seem too rough to me. The distribution shape definitely affects the best number of bins. In a right-skewed distribution such as this one, one needs a fairly large number of bins to cover the long right tail. That may explain why the guidelines seem to underestimate the optimal number of bins.