Assignment I for Biostats Course VHM 801 at AVC - Fall semester 2018

The assignment is worth 10% of the final course mark. Please be aware that by handing in the home assignment you implicitly acknowledge to have read and accepted the instructions for home assignments as described on the VHM 801 homepage.

The topic of the assignment is milk production parameters for heifers in early lactation (that is, just after the birth of the cow's first calf). Studies based on data from herds in Belgium have shown the utility of such early milk production recordings for predicting the heifer's performance throughout the entire first lactation (see e.g. De Vliegher et al, 2005). As a starting point, it would be of interest to describe these early lactation parameters on their own. A subdataset of the data collected for the Belgian studies included a single observation in early lactation for a total of 482 heifers. The following variables were obtained for this single record per heifer in early lactation:

The dataset is available in Minitab format and as a comma-separated file, for import into Stata and other statistical software.

The home assignment has four questions (a)-(d) which should all be answered.

  1. Characterize the study type (e.g., experimental or another type), and describe the variable type for each of the variables in the dataset. (Hint: you may use some of the following descriptors for variables: categorical, nominal, ordinal, quantitative, discrete, continuous.)

  2. Carry out a descriptive analysis for each of the variables, including both a graphical representation and descriptive statistics. Choose the graphical representation and the statistics you find most useful to show each distribution, in consideration of the variable's type and range of values. Where appropriate, comment specifically on the distribution's center, spread and shape, as well as any outlying observations (make sure to distinguish between truly outlying observations, in the sense that they do not really belong to the distribution, and observations that in your view should be considered as part of the distribution). As part of the descriptive analysis, discuss whether it would seem reasonable to assume the values to be (approximately) normally distributed. Describe carefully how you assess the agreement of the variable with a normal distribution; if you think a variable cannot meaningfully be compared with a normal distribution, state that and explain your reasoning.

  3. Somatic cell counts above 200 (thousand) may be taken to indicate a subclinical infection of the udder. Use the data to estimate the proportion of heifers with such elevated somatic cell counts in early lactation. This can be done by assuming a normal distribution for the somatic cell counts or the logarithmic cell counts (or possibly cell counts on some other scale), or entirely without assuming a normal distribution. Motivate your choice of method, even if not carrying out the alternative computations.

  4. Further explore whether the distributions of somatic cell counts (at either original or log-transformed scale, as per your own choice) and milk yields are similar across breeds (based on the breed categories in the data), using descriptive statistical tools. Describe your findings and try to draw conclusions. Note that you are not expected to compute any statistical tests to compare the distributions. In view of your findings, do you want to modify any of your analyses and/or conclusions from the previous questions?

Reference:
De Vliegher S, Barkema HW, Stryhn H, Opsomer G, de Kruif A (2005), Impact in dairy heifers of early lactation somatic cell count on milk yield over the first lactation. Journal of Dairy Science 88, 938-947.


Henrik Stryhn (hstryhn@upei.ca) 2018-09-26