Assignment II

Assignment II for Biostats Course VHM 801 at AVC - Fall semester 2019

The assignment is worth 15% of the final course mark.

The home assignment has six parts (or questions) with equal weight; to achieve a 100% mark you are required to answer five of the six questions correctly. It is allowed to answer all six questions; your score will then be determined as the total score for the five questions with highest scores. All answers should include text explaining the procedures used; in particular, all statistical assumptions should be specified and motivated/justified. Recall that by handing in the home assignment you implicitly acknowledge to have read and accepted the instructions for home assignments as described on the VHM 801 homepage.

The characteristics of diagnostic tests for diseases or conditions in subjects (e.g. animals or humans) are important to aid the interpretation of the test results. The most commonly used characteristics are the test sensitivity and test specificity. The test sensitivity is the chance (probability) of detecting a diseased subject by the test, whereas the test specificity is the chance (probability) of getting a negative test result for a non-diseased subject. Ideally a test should have high values of both the sensitivity and specificity, but this may be difficult to achieve in practice. Thus it is important to know these characteristics; for example, if a test has a relatively low specificity (say 0.80) it means that the test will give a quite high (in the example, 20%) rate of false positives.

Sensitivity and specificity are defined for tests with a dichotomous outcome (negative or positive). However, many tests produce results on a quantitative scale. Examples include different types of ELISA tests and real-time PCR tests. A dichotomous result may then be obtained by comparing the quantitative test result obtained to a cut-off value. For example, if large values of a test are associated with disease and low values with non-disease, a cut-off value of 1.5 would imply that a test result is considered positive if the result is greater than 1.5, whereas it is considered negative for results less than or equal to 1.5. Although the situation described in the following is based on a real test, we will omit the details of the test and refer to the test results as percentages on a scale that allows for values below zero and above 100%.

Part 1.
A large number of animals from a region that was considered disease-free were tested, and their test results could be approximated well by a normal distribution with mean 9% and standard deviation 17%. Similarly, a large number of diseased animals had been tested, and their test results could be approximated well by a normal distribution with mean 75% and standard deviation 12%. Use these distributions to compute the sensitivity and specificity of the test at a cut-off value (as defined above) of 50%.

Part 2.
A researcher wants to improve the test's ability to detect diseased animals. At what cut-off value would the test achieve a sensitivity of 0.99? Sometimes the cut-off is chosen so that the sensitivity and specificity become equal. Determine also (approximately) the cut-off value to achieve this.

Part 3.
A batch of 35 animals with unknown disease status was tested, and the following values were obtained (Minitab worksheet, comma-separated file):

83.4 74.9 82.8 50.3 -3.8 77.0 56.9 18.3 62.1 9.3 87.4 85.6 
81.6 73.1 76.1 71.2 11.8 65.6 84.9 6.0 75.3 68.4 61.7 82.8 
76.4 78.2 78.9 -19.5 31.1 0.2 65.8 73.7 85.8 21.0 78.9

Use the 50% cut-off to classify the animals as diseased or non-diseased, and estimate (with a 95% confidence interval) the proportion of diseased animals, as determined by the results of this test, in the population the sample was taken from. Do any of these 35 test results give rise to questions or concerns with the classification? If yes, discuss how you this (or these) result(s) can be interpreted most sensibly, and how you could possibly use additional information to improve the interpretation.

Part 4.
In addition to using the test results to classify animals, it is also of interest to describe the distribution of test results in the sample. Use descriptive and inferential statistical methods to describe the distribution, with particular focus on comparing the distribution with what was expected from previous data for the test (as described above). (Hint:Your statistical inference should include at least one confidence interval for a relevant parameter and one statistical test of relevance to answer the question.)

Part 5.
One way of improving the chance of correctly classifying an animal as diseased or non-diseased is to apply an additional test. Assume for this part that we have at our disposal two tests A and B, where test A has sensitivity 0.9 and specificity 0.99, and test B has sensitivity 0.95 and specificity 0.95. If one of two tests is much more expensive than the other test, one would often want to use the expensive test only for samples (animals) that already tested positive on a first test; this second test is then called a confirmatory test (it is used to confirm the result of the first test). If test A is the initial test and test B is the confirmatory test, we consider a combined test with the following steps: (i) apply test A to all samples, (ii) apply test B to all samples that tested positive with test A, and (iii) consider only samples that tested positive on both tests as positive. Compute the sensitivity and specificity of this combined test. Answer also the same question with the roles of tests A and B reversed. Interpret your findings, and their implication for the choice of initial and confirmatory tests.

Part 6.
When animals are held or housed in groups corresponding to physical confinements (e.g., pigs in pens, cattle in feedlots, fish in cages), it is often of more interest to correctly identify the disease status of the group than of the animals themselves. Consider a fish cage holding 10,000 fish, and a sample of 50 fish from the cage (in aquaculture, these values are not unrealistic). If no fish in the cage is diseased, compute the probability that at least one fish in the sample will give a (false) positive test result for test A of Part 5. Repeat the calculation for test B, and interpret your findings. Finally, discuss the steps in computing the probability of getting at least one positive test result among a sample of 50 fish when the cage includes both diseased and non-diseased fish. It is not required to carry out calculations, but your discussion should cover the important considerations for the computation.

Henrik Stryhn (hstryhn@upei.ca) 2019-10-14