Assignment II for Biostats Course VHM 801 at AVC - Winter semester 2026
The assignment is worth 10% of the final course mark. Please be aware that by handing
in the home assignment you implicitly acknowledge to have read and accepted
the instructions for home assignments as described
on the VHM 801 homepage.
The assignment is a continuation of the first home
assignment on Bovine leukemia virus (BLV) and dairy production
in the Maritime provinces of Canada. Among the production parameters, we will
here focus on milk production only, and the dataset for this assignment
(Minitab format; comma-separated format)
includes the herd, province, 305-day milk production
and bulk-tank elisa variables for the same herds as in the first assignment, as well
as the additional variables:
- herd_blv: the herd BLV status (0=negative/non-infected, 1=positive/infected),
- milk305_blv0: the average lactation (305-day) milk production for the BLV-negative
cows in the herd (kg),
- milk305_blv1: the average lactation (305-day) milk production
for the BLV-positive cows in the herd (kg).
The new variables above were constructed from more extensive data collected for the herds, whereby
each individual cow was tested for BLV and classified as non-infected/negative
or infected/positive. This allowed a calculation of the average milk productions for the
negative and positive cows within the herd, respectively, and it also lead to a
classification/status of the
herd as BLV-negative if all the cows were negative and as BLV-positive if at
least one cow was BLV-positive. Missing values occur for the milk production averages if
no cows in the herd matched the classification as either non-infected or infected.
This home assignment has four questions (a)-(d) which should all be
answered using the variables in the new dataset. For all
questions it is critical that you state your statistical
model/assumptions explicitly, describe your statistical methods, and discuss (to the best of your ability based on
the information provided) whether you think these assumptions are met to
a reasonable degree. It is allowed/expected that you use statistical software for the
calculations, but you must explain the important steps involved in the
calculations. If you are not sure about the level of detail required for
the explanations, follow the general principle that your explanation
should enable a person who knows about statistics to replicate your results if (s)he
had access to the data.
-
The researchers conducting the study wanted to assess whether the milk production
in the 70 herds was typical for milk production in Holstein herds. A relevant comparison
would be the Canadian average 305-day milk production for Holstein cows for 2012, stated
at the official Canadian Dairy Information Centre
to be 9979 kg. This average milk production was based on records
for several hundred thousand cows and can for the purpose of comparison with the averages
for the 70 herds be considered as a value with no (negligible) uncertainty.
Use statistical inference to assess whether the milk production from the
70 herds agreed with this national average, and draw conclusions.
Include a brief discussion of your findings and your thoughts about
the implications they might have for the study. (Note: The
discussion is expected to be from a layman's perspective; you are (probably) not
an expert on the Canadian dairy industry!)
-
One of the objectives stated for the first home assignment was to explore associations
between milk production and BLV infection, and part d) of the assignment included a comparison
between BLV-infected and non-infected herds (as defined from the elisa variable).
Biologically one might suspect BLV-infected cows to have lower milk
production than non-infected cows, but the literature does not consistently support this
supposition. Continue your exploration of this question by a
statistical assessment of whether the present data provide any evidence that
BLV-infected herds (as defined by the herd_blv varible) had different average milk production than non-infected herds.
Draw conclusions and discuss the implications of your findings.
-
Another way of exploring the question from b) would be to compare the milk
production in BLV-infected and non-infected cows. The data obtained from these herds
allow such a comparison, using the herd average 305-day milk productions among infected and
non-infected cows in each herd. For example, in herd 1 the average milk production for non-infected cows
was 10234.8 kg, and the average production for the infected cows was 10198.2 kg.
Use statistical inference to quantify the difference in average milk
production between BLV infected and non-infected cows, and draw
conclusions. Combine the results and conclusions from b) and c) in a
discussion of the information obtained about the research question from the
data.
-
This question is about the sensitivity and specificity of the
classification of herds into BLV-negative and BLV-positive
by the bulk-tank ELISA threshold of 5 from Home assignment 1. The
sensitivity and specificity of diagnostic tests was introduced briefly
in Session 4 (4L-15), but for the sake of
completeness a brief introduction is included here as well.
- The characteristics of diagnostic tests for diseases or conditions in subjects
(e.g. animals or humans) are important to aid the
interpretation of the test results. The most commonly used
characteristics are the test sensitivity and test specificity. The test
sensitivity is the chance (probability) of detecting a truly diseased
subject by the test, whereas the test specificity is the chance
(probability) of getting a negative test result for a truly non-diseased
subject. Ideally a test should have high values of both the sensitivity
and specificity but this may be difficult to achieve in practice. Thus
it is important to know these characteristics; for example, if a test has a relatively low specificity
(say 0.80) it means that the test will give a quite high (in the
example, 20%) rate of false positives. Similarly, if a test has low sensitivity (say 0.60)
it will detect only a limited fraction (in the example, 60%) of the positives and have a
high false negative rate (40%).
We will assume that the variable
herd_blv gives a "correct" classification of the herd's BLV status because it is based on
detailed records for all cows in the herd. It was one of the major objectives
of the study to develop a tool to predict herd BLV status from less extensive
data, and our focus here is on the classification obtained by the bulk-tank
ELISA at threshold 5. Therefore, the task is to use the present data to estimate the sensitivity and
specificity of this classification rule ("test"). Include a 95% confidence
interval for at least one of the two parameters. (Caution: Several methods exist so you need
to choose your method carefully and justify your choice.) As your conclusion, describe the confidence you obtain about the
performance of the ELISA rule (considered as a "test") from your analysis. (Minitab
hint: A cross-tabulation of two categorical variables can be
obtained from the Stat-Tables-Cross Tabulation and Chi Square
menu.)
Henrik Stryhn
(hstryhn@upei.ca) 2026-02-25