Assignment II for Biostats Course VHM 801 at AVC - Winter semester 2026

The assignment is worth 10% of the final course mark. Please be aware that by handing in the home assignment you implicitly acknowledge to have read and accepted the instructions for home assignments as described on the VHM 801 homepage.

The assignment is a continuation of the first home assignment on Bovine leukemia virus (BLV) and dairy production in the Maritime provinces of Canada. Among the production parameters, we will here focus on milk production only, and the dataset for this assignment (Minitab format; comma-separated format) includes the herd, province, 305-day milk production and bulk-tank elisa variables for the same herds as in the first assignment, as well as the additional variables:

The new variables above were constructed from more extensive data collected for the herds, whereby each individual cow was tested for BLV and classified as non-infected/negative or infected/positive. This allowed a calculation of the average milk productions for the negative and positive cows within the herd, respectively, and it also lead to a classification/status of the herd as BLV-negative if all the cows were negative and as BLV-positive if at least one cow was BLV-positive. Missing values occur for the milk production averages if no cows in the herd matched the classification as either non-infected or infected.

This home assignment has four questions (a)-(d) which should all be answered using the variables in the new dataset. For all questions it is critical that you state your statistical model/assumptions explicitly, describe your statistical methods, and discuss (to the best of your ability based on the information provided) whether you think these assumptions are met to a reasonable degree. It is allowed/expected that you use statistical software for the calculations, but you must explain the important steps involved in the calculations. If you are not sure about the level of detail required for the explanations, follow the general principle that your explanation should enable a person who knows about statistics to replicate your results if (s)he had access to the data.

  1. The researchers conducting the study wanted to assess whether the milk production in the 70 herds was typical for milk production in Holstein herds. A relevant comparison would be the Canadian average 305-day milk production for Holstein cows for 2012, stated at the official Canadian Dairy Information Centre to be 9979 kg. This average milk production was based on records for several hundred thousand cows and can for the purpose of comparison with the averages for the 70 herds be considered as a value with no (negligible) uncertainty. Use statistical inference to assess whether the milk production from the 70 herds agreed with this national average, and draw conclusions. Include a brief discussion of your findings and your thoughts about the implications they might have for the study. (Note: The discussion is expected to be from a layman's perspective; you are (probably) not an expert on the Canadian dairy industry!)

  2. One of the objectives stated for the first home assignment was to explore associations between milk production and BLV infection, and part d) of the assignment included a comparison between BLV-infected and non-infected herds (as defined from the elisa variable). Biologically one might suspect BLV-infected cows to have lower milk production than non-infected cows, but the literature does not consistently support this supposition. Continue your exploration of this question by a statistical assessment of whether the present data provide any evidence that BLV-infected herds (as defined by the herd_blv varible) had different average milk production than non-infected herds. Draw conclusions and discuss the implications of your findings.

  3. Another way of exploring the question from b) would be to compare the milk production in BLV-infected and non-infected cows. The data obtained from these herds allow such a comparison, using the herd average 305-day milk productions among infected and non-infected cows in each herd. For example, in herd 1 the average milk production for non-infected cows was 10234.8 kg, and the average production for the infected cows was 10198.2 kg. Use statistical inference to quantify the difference in average milk production between BLV infected and non-infected cows, and draw conclusions. Combine the results and conclusions from b) and c) in a discussion of the information obtained about the research question from the data.

  4. This question is about the sensitivity and specificity of the classification of herds into BLV-negative and BLV-positive by the bulk-tank ELISA threshold of 5 from Home assignment 1. The sensitivity and specificity of diagnostic tests was introduced briefly in Session 4 (4L-15), but for the sake of completeness a brief introduction is included here as well.

    We will assume that the variable herd_blv gives a "correct" classification of the herd's BLV status because it is based on detailed records for all cows in the herd. It was one of the major objectives of the study to develop a tool to predict herd BLV status from less extensive data, and our focus here is on the classification obtained by the bulk-tank ELISA at threshold 5. Therefore, the task is to use the present data to estimate the sensitivity and specificity of this classification rule ("test"). Include a 95% confidence interval for at least one of the two parameters. (Caution: Several methods exist so you need to choose your method carefully and justify your choice.) As your conclusion, describe the confidence you obtain about the performance of the ELISA rule (considered as a "test") from your analysis. (Minitab hint: A cross-tabulation of two categorical variables can be obtained from the Stat-Tables-Cross Tabulation and Chi Square menu.)

Henrik Stryhn (hstryhn@upei.ca) 2026-02-25