Assignment I for Biostats Course VHM 801 at AVC - Fall semester 2018
The assignment is worth 10% of the final course mark. Please be aware that by handing
in the home assignment you implicitly acknowledge to have read and accepted
the instructions for home assignments as described
on the VHM 801 homepage.
The topic of the assignment is milk production parameters for heifers in
early lactation (that is, just after the birth of the cow's first calf).
Studies based on data from herds in Belgium have shown
the utility of such early milk production recordings for
predicting the heifer's performance throughout the entire first
lactation (see e.g. De Vliegher et al, 2005). As a starting point, it would be of
interest to describe these early lactation parameters on their own. A subdataset of
the data collected for the Belgian studies included a single observation in early lactation for
a total of 482 heifers. The following variables
were obtained for this single record per heifer in early lactation:
- dim_el: days in milk (since calving); by the study design, these
are between 5 and 14 days,
- scc_el: somatic cell count (in thousands),
- lnscc_el: natural log of somatic cell count,
- kg_el: milk production (in kg),
- breed: breed of heifer (1=red Holstein; 2=black Holstein;
3=red-white, MRY(Meuse-Rhine-Yssel), East-Flemish; 0=BWB(Belgian White Blue), others and unknown).
The dataset is available in Minitab format
and as a comma-separated file, for import into Stata and other statistical software.
The home assignment has four questions (a)-(d) which should all be answered.
- Characterize the study type (e.g., experimental or another type),
and describe the variable type for each of the variables in the dataset.
(Hint: you may use some of the following descriptors for
variables: categorical, nominal, ordinal, quantitative, discrete, continuous.)
- Carry out a descriptive analysis for each of the variables, including both a graphical representation and
descriptive statistics. Choose the graphical representation and the statistics you find most
useful to show each distribution, in consideration of the variable's type and range of values. Where appropriate,
comment specifically on the distribution's center, spread and shape, as
well as any outlying observations (make sure to distinguish between
truly outlying observations, in the sense that they
do not really belong to the distribution, and observations that in your view
should be considered as part of the distribution).
As part of the descriptive analysis, discuss whether
it would seem reasonable to assume the values to be (approximately) normally distributed.
Describe carefully how you assess
the agreement of the variable with a normal distribution; if you think a variable cannot
meaningfully be compared with a normal distribution, state that and explain your reasoning.
- Somatic cell counts above 200 (thousand) may be taken to indicate a subclinical infection
of the udder. Use the data to estimate
the proportion of heifers with such elevated somatic cell counts in early lactation.
This can be done by assuming a normal distribution for the somatic cell
counts or the logarithmic cell counts (or possibly cell counts on some other scale),
or entirely without assuming a normal
distribution. Motivate your choice of method, even if not carrying out
the alternative computations.
- Further explore whether the distributions of somatic cell counts
(at either original or log-transformed scale, as per your own choice)
and milk yields are similar across breeds (based on the breed categories
in the data), using descriptive statistical tools.
Describe your findings and try to draw conclusions.
Note that you are not expected to compute any statistical tests to compare the distributions.
In view of your findings, do you want to modify any of your analyses and/or conclusions
from the previous questions?
Reference:
De Vliegher S, Barkema HW, Stryhn H, Opsomer G, de
Kruif A (2005), Impact in dairy heifers of early lactation somatic
cell count on milk yield over the first lactation. Journal of
Dairy Science 88, 938-947.
Henrik Stryhn
(hstryhn@upei.ca) 2018-09-26