Extra exercise 9 ---------------- Sampling of n=25 students from a population of size N=100, in which the proportion of individuals meeting a certain condition (owning electronics of brand A) is p=0.7. Let X denote the number of students meeting the condition in a sample drawn from the population. (a) Sampling with replacement leads to a binomial distribution for X, that is X ~ Bin(25,p). The Minitab listing below includes a listing of the probabilities P(X=x) for x=0,...,25. The listing and also the histogram shows that these probabilities are very small up till about x=10. The mean and variance of X (i.e., in the binomial distribution) are EX = n*p = 25*0.7 = 17.5 VarX = n*p*(1-p) = 25*0.7*0.3 = 5.25 (b) Sampling without replacement leads to a hypergeometric distribution for X, with parameters (N,n,p) or (N,n,M) where M=N*p=100*0.7=70 is the total number of individuals in the population meeting the condition. The Minitab listing below also includes the probabilities in this hypergeometric distribution. i) The hypergeometric distribution appears more narrow than the binomial, with smaller probabilities in the tails and larger probabilities in the centre. ii) The biggest difference in probability between the two distributions is seen to occur for x=17. iii) The mean is computed by adding up the terms "x*pxhyp", and the calculation in Minitab shows the sum over these terms to equal 17.5. Therefore the binomial and hypergeometric distributions have the same mean. It is indeed always true that the means from sampling with and without replacement are the same. iv) The variance is computed by adding up the terms "((x-mean)^2)*pxhyp", and the calculation in Minitab shows the sum over these terms to equal 3.98. Therefore the hypergeometric distributions has lower spread than the binomial distribution, and this is indeed always true when comparing sampling distributions from sampling with and without replacement. (c) According to the guideline for when it is acceptable to use a binomial distribution for sampling without replacement from a finite population (slide 4L-10), we must have N>=20*n. This is clearly violated in the scenario discussed, and we would only meet this condition if N>=20*n or n<=N/20=100/20=5 So only with a sample size of up till 5 individuals, a binomial distribution would be acceptable by this rule. Minitab commands and output: Name c1 "x" Set 'x' 1( 0 : 25 / 1 )1 End. Name c2 "pxbin" PDF 'x' 'pxbin'; Binomial 25 .7. DPlot; Distribution; Binomial 25 .7. Name c3 "pxhyp" PDF 'x' 'pxhyp'; Hypergeometric 100 70 25. DPlot; Distribution; Hypergeometric 100 70 25. Name C4 'diffp' Let 'diffp' = 'pxbin'-'pxhyp' Name C5 'mterm' Let 'mterm' = 'x'*'pxhyp' Sum 'mterm'. Sum of mterm = 17.5 Name C6 'vterm' Let 'vterm' = (('x'-17.5)^2)*'pxhyp' Sum 'vterm'. Sum of vterm = 3.97727 Print 'x'-'vterm'. Data Display Row x pxbin pxhyp diffp mterm vterm 1 0 0.000000 0.000000 0.0000000 0.00000 0.000000 2 1 0.000000 0.000000 0.0000000 0.00000 0.000000 3 2 0.000000 0.000000 0.0000000 0.00000 0.000000 4 3 0.000000 0.000000 0.0000000 0.00000 0.000000 5 4 0.000000 0.000000 0.0000000 0.00000 0.000000 6 5 0.000000 0.000000 0.0000003 0.00000 0.000000 7 6 0.000002 0.000000 0.0000024 0.00000 0.000004 8 7 0.000015 0.000000 0.0000149 0.00000 0.000047 9 8 0.000081 0.000005 0.0000759 0.00004 0.000421 10 9 0.000355 0.000039 0.0003159 0.00035 0.002817 11 10 0.001325 0.000254 0.0010712 0.00254 0.014273 12 11 0.004216 0.001298 0.0029181 0.01427 0.054820 13 12 0.011476 0.005254 0.0062221 0.06304 0.158923 14 13 0.026777 0.016928 0.0098483 0.22007 0.342801 15 14 0.053554 0.043530 0.0100232 0.60942 0.533246 16 15 0.091636 0.089382 0.0022538 1.34073 0.558639 17 16 0.133636 0.146310 -0.0126742 2.34096 0.329198 18 17 0.165080 0.190125 -0.0250453 3.23212 0.047531 19 18 0.171194 0.194717 -0.0235236 3.50491 0.048679 20 19 0.147166 0.155432 -0.0082657 2.95321 0.349722 21 20 0.103017 0.095125 0.0078920 1.90249 0.594528 22 21 0.057231 0.043555 0.0136762 0.91466 0.533551 23 22 0.024280 0.014372 0.0099082 0.31618 0.291028 24 23 0.007390 0.003214 0.0041760 0.07391 0.097210 25 24 0.001437 0.000434 0.0010028 0.01042 0.018337 26 25 0.000134 0.000027 0.0001075 0.00067 0.001497