Supplementary Exercises 4.75 and 4.76 of IPS7e ---------------------------------------------- Typographical errors of different types (nonword or word) in 250-word essays. Probability distributions for the number of errors: Number of errors Type 0 1 2 3 4 non-word 0.1 0.2 0.3 0.3 0.1 word 0.4 0.3 0.2 0.1 0.0 Both distributions are valid because the probabilities are >=0 and sum to 1 across the possible outcomes. 4.75: ----- Mean number of non-word errors: EX = 0*0.1 + 1*0.2 + 2*0.3 + 3*0.3 + 4*0.1 = 2.1 Mean number of word errors: EY = 0*0.4 + 1*0.3 + 2*0.2 + 3*0.1 = 1.0 On the average, the number of non-word errors is about twice that of word errors. 4.76: ----- With the notation of the previous exercise, the total number of errors is T=X+Y. We first answer the question about the mean. (a)+(b): The addition rule for means tells us that the mean total number of errors can be computed as (regardless of whether the two numbers of errors are independent or dependent): ET = EX+EY = 2.1+1.0 = 3.1. We next turn to the question about the standard deviation (optional). (a): In order to compute the standard deviation for the total number of errors T, we need to first compute the standard deviation for each of the error numbers. We do this in two steps: first the variance, and then the standard deviation. VarX = 0.1*(0-2.1)^2 + 0.2*(1-2.1)^2 + 0.3*(2-2.1)^2 + 0.3*(3-2.1)^2 + 0.1*(4-2.1)^2 = 0.441+0.242+0.003+0.243+0.361 = 1.29 sdX = sqrt(VarX) = sqrt(1.29) = 1.1358 VarY = 0.4*(0-1)^2 + 0.3*(1-1)^2 + 0.2*(2-1)^2 + 0.1*(3-1)^2 = 0.4+0+0.2+0.4 = 1.0 sdY = sqrt(VarY) = sqrt(1.0) = 1.0 When the variables X and Y are independent, we can use the addition rule for variances to compute the variance of the sum (or use the rule for standard deviations directly): Var(X+Y) = VarX+VarY = 1.29+1 = 2.29 sd(X+Y) = sqrt(Var(X+Y)) = sqrt(2.29) = 1.513, or directly sd(X+Y) = sqrt(1.1358^2+1^2) = sqrt(2.29) = 1.513. Note that it is incorrect to add up the standard deviations! (b): When the variables X and Y are dependent, the simple addition rule does not apply because the correlation between the two variables need to be taken into account. We will discuss the extended rule in Session 10. --- Finally, Minitab commands to carry out some of these calculations after the distributions have been entered in suitable columns, labeled 'x', 'p(x)', 'y', and 'p(y)'. If you use the provided data file, it may be helpful to rename the variables suitably, in order to simplify the expressions. Name C5 'meanx' Let 'meanx' = 'x'*'p(x)' Sum 'meanx'. Sum of meanx = 2.1 Name C6 'varx' Let 'varx' = ('x'-2.1)^2*'p(x)' Sum 'varx'. Sum of varx = 1.29 Name C7 'meany' Let 'meany' = 'y'*'p(y)' Sum 'meany'. Sum of meany = 1 Name C8 'vary' Let 'vary' = ('y'-1)^2*'p(y)' Sum 'vary'. Sum of vary = 1