Assignment I for Biostats Course VHM 801 at AVC - Fall semester 2016
The assignment is worth 10% of the final course mark.
The data for the assignment are from a study on women's back pain during pregnancy.
It was carried out at a single hospital during four summer months
several years ago, and comprised all women giving birth in that period.
The data included here for 174 women contain information
about the perceived back pain during pregnancy, as well as other
characteristics, as summarized in the list below:
- id: woman id (1-174),
- sev: back pain severity score (0=none; 1=minor; 2=troublesome; 3=severe),
- month: month of pregnancy the back pain started (0-9),
- age: woman's age (years),
- height: woman's height (m),
- weight0: woman's weight at start of pregancy (kg),
- weight1: woman's weight at end of pregancy, post-delivery (kg),
- weightb: weight of newborn child (kg),
- nkids: number of kids from previous pregnancies,
- prevsev: back pain severity score in previous pregnancy(ies)
(1=n/a; 2=no; 3=yes, mild; 4=yes, severe)
- r1: relief of back pain by tablets such as aspirin? (0=no; 1=yes),
- r2: relief of back pain by standing? (0=no; 1=yes),
- r3: relief of back pain by sitting? (0=no; 1=yes),
- r4: relief of back pain by lying? (0=no; 1=yes),
- r5: relief of back pain by walking? (0=no; 1=yes),
- a1: aggravation of back pain by fatigue? (0=no; 1=yes),
- a2: aggravation of back pain by standing? (0=no; 1=yes),
- a3: aggravation of back pain by sitting? (0=no; 1=yes),
- a4: aggravation of back pain by lying? (0=no; 1=yes),
- a5: aggravation of back pain by walking? (0=no; 1=yes),
All variables were retrieved from questionnaires
the women were asked to fill out, assisted by a physiotherapist, within 24 hours of
delivery; the response rate was 100%. The dataset is available
in Minitab format and as a comma-separated file,
for import into Stata and other statistical software.
The home assignment has five questions which should all be answered.
- Use the above description of the data collection to characterize the study
type as well as the selection of subjects for the study. Discuss next whether the data collection
procedures used imply any restrictions on the group of women (pregnancies) these data may be considered
representative for. Alternatively, you may simply describe the group of women (pregnancies)
these data in your view could be considered representative for.
- Select four variables in the dataset: two quantitative variables, one categorical variable (with more than
two categories), and one dichotomous (or binary) variable. Apart from this restriction on the variable types
you are free to select the variables as you like. Carry out a descriptive analysis of
your four selected variables including both a graphical representation and
descriptive statistics. Choose the graphical representation and the statistics you find most
useful to show each distribution,
in consideration of the variable's type and range of values. Where appropriate,
comment specifically on the distribution's center, spread and shape.
Discussion of 'outliers' may be deferred to the next question.
- Find and discuss at least one observation which you think
is an outlier (in the sense of a 'real outlier', as opposed to a
'potential outlier' detected by automated data screening procedures).
Make it clear why you think the observation(s) could be truly different
than the others.
- Find and discuss at least two (different) instances of errors or inconsistencies
in the data. Describe carefully why you think an observation is most likely an error,
or why a set of observations in your view are inconsistent.
- Select a continuous variable (possibly one of the variables previously described) to examine whether
it would seem reasonable to assume its values to be normally distributed.
Describe carefully the tools you use for this, and how you arrive at
your conclusions.
If you conclude that the variable is not normally distributed, describe how its distribution seems to differ
from a normal distribution. Additionally, compute each woman's weight gain
during pregnancy, and carry out a similar analysis for this variable;
make sure to include any supplementary analysis that would seem relevant to
support your interpretation of the results.
Henrik Stryhn
(hstryhn@upei.ca) 2016-09-28