Assignment III for Biostats Course VHM 801 at AVC - Fall semester 2016

The assignment is worth 10% of the final course mark. Please be aware that by handing in the home assignment you implicitly acknowledge to have read and accepted the instructions for home assignments as described on the VHM 801 homepage.

The assignment is a continuation of the first home assignment on a study of women's back pain during pregnancy. You may want to revisit the first home assignment as a preparation for this assignment, and you will use the previously described and supplied dataset. For your work here, you should generally pay attention to all issues and problems with the dataset identified in the first home assignment (such as those described in the solution posted on the VHM 801 homepage). Because the issues around generalizability of the results to a suitable population cannot be resolved analytically, we will for the purpose of this assignment assume the existence of a suitable population of women (pregnancies) for which the sample is representative.

A full mark requires satisfactory answers to the three questions below.

  1. The focus of the study is on the back pain severity scores. Give a statistical model for these pain scores that does not differentiate between the characteristics contained in the additional variables recorded; that is, a model for the pain score in the absence of any additional information about the woman and pregnancy. Estimate the model's parameters, with corresponding 95% confidence intervals. (Hint: It is allowed, although not necessary, to do this in several steps that each focus on a single response category.)

  2. Carry out a statistical analysis for the association between the back pain severity score and a categorical variable of your choice. Your chosen categorical variable should involve either a plausible biological association with back pain (for which an argument should be given) or an association for which "interesting" findings are obtained. It is allowed to generate a categorical variable from a continuous variable or to modify the categories of an existing categorical variable, as long as such data modifications are explained and justified. Irrespective of your choice of categorical variable, state your statistical model and hypotheses carefully, explain your choice of statistical procedure, draw conclusions from your analysis and interpret your results.

  3. Carry out a statistical analysis for the association between the back pain severity score and a quantitative and continuous response variable of your choice. Your chosen continuous variable should involve a plausible biological association (for which an argument should be given). If you categorized a continuous variable in part 2, you should choose another continuous variable for this part. Irrespective of your choice of continuous variable, state your statistical model and hypotheses carefully, explain your choice of statistical procedure, draw conclusions from your analysis and interpret your results.

    (Hint: As the pain scores are categorical, it is not straightforward to model these as the outcome variable. The following approach is valid, especially for exploratory purposes: use the categorical outcome variable to define groups for which the continuous variable can be compared. (As a general example, to assess an association between "age" and a dichotomous outcome taking the values "yes" and "no", compare the age distribution among subjects with outcome "yes" and the age distribution among subjects with outcome "no". A difference in the age distribution between the two groups will then reflect an association between "age" and the dichotomous outcome.) For the four categories of pain scores, you may either use methods to compare multiple samples (covered in Session 10 of VHM 801) or create a dichotomous outcome. In the latter case, the data modification should again be explained and justified.)


Henrik Stryhn (hstryhn@upei.ca) 2016-11-02