Assignment III for Biostats Course VHM 801 at AVC - Fall semester 2018

The assignment is worth 10% of the final course mark. Please be aware that by handing in the home assignment you implicitly acknowledge to have read and accepted the instructions for home assignments as described on the VHM 801 homepage.

Twin pair studies are frequently used to explore genetic hypotheses and effects. For this assignment, the question of interest is whether tobacco smoking habits have a genetic component. For this purpose information in a database on twins was retrieved, and a number of twin pairs were classified as having different or the same smoking habits. Here smoking habits include both whether each individual was a smoker or not, and in the former case also the type of smoking (e.g., cigarettes, cigars or pipe). Additionally, the twin pairs were classified genetically as dizygotic or monozygotic, and among the latter also according to whether the twins were separated at birth or brought up together.

Type of twin pair
Smoking habitsdizygoticmonozygotic, separatedmonozygotic, joint
same92321
different945

The counts in the table are available as a data set in Minitab format and as a comma-separated file, for import into Stata and other statistical software. (Hint: The data format is suited for the first question below, but for subsequent questions you may need to modify the data format or enter the data anew.)

The home assignment has four questions (a)-(d) which should all be answered. In general, the assumptions of every statistical procedure used should be stated (formally or informally) and checked (where possible), and every statistical analysis should be summarized in a conclusion.

  1. As a first analysis, use these data to investigate whether any association (or dependence) seems to exist between the likeness of smoking habits within a twin pair and the type of twin pair, as classified above. Carry out a statistical test and draw conclusions. (Note: As we will further discuss the findings in the context of genetics in the following questions, you may limit your present conclusion to a statement in statistical terms.)

  2. Regardless of the results of your first analysis, we will proceed by focusing on whether smoking habits are equally likely to be the same in dizygotic and monozygotic twin pairs (combining all monozygotic pairs, irrespective of whether the twins were separated or not). Estimate for both dizygotic and monozygotic twin pairs the probability of the twins having the same smoking habits, and supplement these estimates with suitable 95% confidence intervals. Do your results indicate smoking habits to be more alike in twin pairs with a stronger genetic similarity? Use a statistical test to provide a measure of the evidence offered by the data towards this claim. Draw (first) conclusions about any genetic impact on smoking habits.

  3. In continuation of the previous question, it is of interest to further explore whether the likeness of smoking habits could be linked with the environment in which children were brought up. Use the data for (only) monozygotic twin pairs to estimate a (statistical) parameter that could describe a potential environmental effect, with an associated 95% confidence interval, and use a statistical test to assess whether the data show evidence of this parameter being relevant (i.e., non-zero). Draw conclusions about any environmental impact on smoking habits.

  4. Finally discuss how the results from these data could be used to form an argument against the "hypothesis" that smoking causes detrimental health effects (in particular regarding lung cancer), based on observational studies that have shown associations between smoking and such detrimental health effects. (Hint: It may be helpful to explore this in terms of the diagrams/schematics introduced in Session 2 to describe confounding.)

Henrik Stryhn (hstryhn@upei.ca) 2018-11-01