. * do-file for lecture 9 of VHM 802, Winter 2015 . version 13 /* use xtmixed instead of mixed in Stata 12 */ . set more off . cd "h:\vhm\vhm802\data_csv" h:\vhm\vhm802\data_csv . . * Floral scent example . import delimited florallong.csv, clear (4 vars, 42 obs) . encode tx, gen(Tx) . anova time id per Tx Number of obs = 42 R-squared = 0.8970 Root MSE = 6.46827 Adj R-squared = 0.7778 Source | Partial SS df MS F Prob > F -----------+---------------------------------------------------- Model | 6923.93171 22 314.724169 7.52 0.0000 | id | 6135.31058 20 306.765529 7.33 0.0000 per | 779.001833 1 779.001833 18.62 0.0004 Tx | 3.1290194 1 3.1290194 0.07 0.7874 | Residual | 794.931705 19 41.8385108 -----------+---------------------------------------------------- Total | 7718.86341 41 188.264961 . . * Example 13.12 . import delimited ch13ta4.csv, clear (9 vars, 54 obs) . xi: boxcox yield i.tx i.period i.cow i.tx _Itx_1-3 (naturally coded; _Itx_1 omitted) i.period _Iperiod_1-3 (naturally coded; _Iperiod_1 omitted) i.cow _Icow_1-18 (naturally coded; _Icow_1 omitted) Fitting comparison model Iteration 0: log likelihood = -385.98591 Iteration 1: log likelihood = -384.22867 Iteration 2: log likelihood = -384.22712 Iteration 3: log likelihood = -384.22712 Fitting full model Iteration 0: log likelihood = -314.69985 Iteration 1: log likelihood = -309.99214 Iteration 2: log likelihood = -309.52468 Iteration 3: log likelihood = -309.52451 Iteration 4: log likelihood = -309.52451 Number of obs = 54 LR chi2(21) = 149.41 Log likelihood = -309.52451 Prob > chi2 = 0.000 ------------------------------------------------------------------------------ yield | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- /theta | -.0033826 .3020123 -0.01 0.991 -.5953159 .5885506 ------------------------------------------------------------------------------ Estimates of scale-variant parameters ---------------------------- | Coef. -------------+-------------- Notrans | _Itx_2 | .1283452 _Itx_3 | .20625 _Iperiod_2 | -.1368166 _Iperiod_3 | -.3236702 _Icow_2 | .3283877 _Icow_3 | .2962175 _Icow_4 | .28772 _Icow_5 | .15384 _Icow_6 | .1675437 _Icow_7 | .179774 _Icow_8 | .1756407 _Icow_9 | .1328141 _Icow_10 | .1065849 _Icow_11 | .1235255 _Icow_12 | .0969087 _Icow_13 | .0610728 _Icow_14 | -.0698756 _Icow_15 | .0134546 _Icow_16 | -.0703826 _Icow_17 | -.1146751 _Icow_18 | -.0336909 _cons | 7.090896 -------------+-------------- /sigma | .0523218 ---------------------------- --------------------------------------------------------- Test Restricted LR statistic P-value H0: log likelihood chi2 Prob > chi2 --------------------------------------------------------- theta = -1 -314.64566 10.24 0.001 theta = 0 -309.52458 0.00 0.991 theta = 1 -314.69985 10.35 0.001 --------------------------------------------------------- . gen lnyield=ln(yield) . anova lnyield tx period cow Number of obs = 54 R-squared = 0.9372 Root MSE = .069653 Adj R-squared = 0.8959 Source | Partial SS df MS F Prob > F -----------+---------------------------------------------------- Model | 2.31533305 21 .110253955 22.73 0.0000 | tx | .409994766 2 .204997383 42.25 0.0000 period | .998067965 2 .499033983 102.86 0.0000 cow | .907270319 17 .053368842 11.00 0.0000 | Residual | .155249211 32 .004851538 -----------+---------------------------------------------------- Total | 2.47058226 53 .04661476 . * period and cow effects nested in squares . anova lnyield square tx period|square cow|square Number of obs = 54 R-squared = 0.9584 Root MSE = .06831 Adj R-squared = 0.8999 Source | Partial SS df MS F Prob > F --------------+---------------------------------------------------- Model | 2.36792386 31 .076384641 16.37 0.0000 | square | .623029268 5 .124605854 26.70 0.0000 tx | .409994766 2 .204997383 43.93 0.0000 period|square | 1.05065877 12 .087554898 18.76 0.0000 cow|square | .28424105 12 .023686754 5.08 0.0005 | Residual | .102658402 22 .004666291 --------------+---------------------------------------------------- Total | 2.47058226 53 .04661476 . anova lnyield square tx period##square cow Number of obs = 54 R-squared = 0.9584 Root MSE = .06831 Adj R-squared = 0.8999 Source | Partial SS df MS F Prob > F --------------+---------------------------------------------------- Model | 2.36792386 31 .076384641 16.37 0.0000 | square | .103134251 5 .02062685 4.42 0.0061 tx | .409994766 2 .204997383 43.93 0.0000 period | .998067965 2 .499033983 106.94 0.0000 period#square | .052590808 10 .005259081 1.13 0.3868 cow | .28424105 12 .023686754 5.08 0.0005 | Residual | .102658402 22 .004666291 --------------+---------------------------------------------------- Total | 2.47058226 53 .04661476 . * note that omitting cow nesting in squares "works", but gives wrong result for squares . anova lnyield square tx period|square cow Number of obs = 54 R-squared = 0.9584 Root MSE = .06831 Adj R-squared = 0.8999 Source | Partial SS df MS F Prob > F --------------+---------------------------------------------------- Model | 2.36792386 31 .076384641 16.37 0.0000 | square | .103134251 5 .02062685 4.42 0.0061 tx | .409994766 2 .204997383 43.93 0.0000 period|square | 1.05065877 12 .087554898 18.76 0.0000 cow | .28424105 12 .023686754 5.08 0.0005 | Residual | .102658402 22 .004666291 --------------+---------------------------------------------------- Total | 2.47058226 53 .04661476 . . * Carton example . import delimited ch11ex2.csv, clear (4 vars, 400 obs) . keep if glue==1 & operator==1 (380 observations deleted) . anova strength machine Number of obs = 20 R-squared = 0.6296 Root MSE = 3.85669 Adj R-squared = 0.2963 Source | Partial SS df MS F Prob > F -----------+---------------------------------------------------- Model | 252.877706 9 28.0975229 1.89 0.1679 | machine | 252.877706 9 28.0975229 1.89 0.1679 | Residual | 148.740577 10 14.8740577 -----------+---------------------------------------------------- Total | 401.618283 19 21.1378043 . * likelihood-based analysis, with diagnostics . mixed strength || machine:, reml Performing EM optimization: Performing gradient-based optimization: Iteration 0: log restricted-likelihood = -56.967725 Iteration 1: log restricted-likelihood = -56.966358 Iteration 2: log restricted-likelihood = -56.966358 Computing standard errors: Mixed-effects REML regression Number of obs = 20 Group variable: machine Number of groups = 10 Obs per group: min = 2 avg = 2.0 max = 2 Wald chi2(0) = . Log restricted-likelihood = -56.966358 Prob > chi2 = . ------------------------------------------------------------------------------ strength | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _cons | 156.743 1.185275 132.24 0.000 154.4199 159.0661 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+------------------------------------------------ machine: Identity | var(_cons) | 6.611732 7.410896 .7349005 59.48425 -----------------------------+------------------------------------------------ var(Residual) | 14.87406 6.651881 6.190996 35.73538 ------------------------------------------------------------------------------ LR test vs. linear regression: chibar2(01) = 0.95 Prob >= chibar2 = 0.1645 . predict pred, fitted . predict stdres, rstandard . scatter stdres pred . qnorm stdres . predict pred_ref, reffects . bysort machine: generate within_n=_n . qnorm pred_ref if within_n==1 . swilk pred_ref if within_n==1 Shapiro-Wilk W test for normal data Variable | Obs W V z Prob>z -------------+-------------------------------------------------- pred_ref | 10 0.96555 0.531 -1.023 0.84677 . . * Lab example, dilution 1 . import delimited add9ex1_2.csv, clear (3 vars, 30 obs) . encode lab, gen(Lab) . anova phenol Lab if dilu==1 Number of obs = 10 R-squared = 0.9144 Root MSE = .371484 Adj R-squared = 0.8459 Source | Partial SS df MS F Prob > F -----------+---------------------------------------------------- Model | 7.37000025 4 1.84250006 13.35 0.0070 | Lab | 7.37000025 4 1.84250006 13.35 0.0070 | Residual | .690000114 5 .138000023 -----------+---------------------------------------------------- Total | 8.06000036 9 .895555596 . * likelihood-based analysis . mixed phenol || lab: if dilu==1, reml Performing EM optimization: Performing gradient-based optimization: Iteration 0: log restricted-likelihood = -10.192733 Iteration 1: log restricted-likelihood = -10.192733 Computing standard errors: Mixed-effects REML regression Number of obs = 10 Group variable: lab Number of groups = 5 Obs per group: min = 2 avg = 2.0 max = 2 Wald chi2(0) = . Log restricted-likelihood = -10.192733 Prob > chi2 = . ------------------------------------------------------------------------------ phenol | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _cons | 6.7 .4292435 15.61 0.000 5.858698 7.541302 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+------------------------------------------------ lab: Identity | var(_cons) | .8522501 .6528823 .1898838 3.825129 -----------------------------+------------------------------------------------ var(Residual) | .138 .0872789 .0399514 .4766793 ------------------------------------------------------------------------------ LR test vs. linear regression: chibar2(01) = 6.47 Prob >= chibar2 = 0.0055 . display 2*sqrt(2)*sqrt(.138) /* Repeatability */ 1.050714 . display 2*sqrt(2)*sqrt(.8522501+.138) /* Reproducibility */ 2.8146049 . . * Lab example, full data analysis . anova phenol Lab dilu / Lab#dilu /* note specification */ Number of obs = 30 R-squared = 0.9927 Root MSE = .401663 Adj R-squared = 0.9859 Source | Partial SS df MS F Prob > F -----------+---------------------------------------------------- Model | 330.506671 14 23.6076193 146.33 0.0000 | Lab | 47.6999968 4 11.9249992 8.59 0.0054 dilu | 271.698674 2 135.849337 97.84 0.0000 Lab#dilu | 11.1080001 8 1.38850001 -----------+---------------------------------------------------- | Residual | 2.4200013 15 .16133342 -----------+---------------------------------------------------- Total | 332.926672 29 11.4802301 . generate lab_dil=Lab+10*dilu . mixed phenol i.dilu || lab: || lab_dil:, reml Performing EM optimization: Performing gradient-based optimization: Iteration 0: log restricted-likelihood = -34.353272 Iteration 1: log restricted-likelihood = -34.353272 Computing standard errors: Mixed-effects REML regression Number of obs = 30 ----------------------------------------------------------- | No. of Observations per Group Group Variable | Groups Minimum Average Maximum ----------------+------------------------------------------ lab | 5 6 6.0 6 lab_dil | 15 2 2.0 2 ----------------------------------------------------------- Wald chi2(2) = 195.68 Log restricted-likelihood = -34.353272 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ phenol | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- dilu | 2 | 4.26 .5269725 8.08 0.000 3.227153 5.292847 3 | 7.34 .5269725 13.93 0.000 6.307153 8.372847 | _cons | 6.7 .7000478 9.57 0.000 5.327931 8.072069 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+------------------------------------------------ lab: Identity | var(_cons) | 1.756085 1.410132 .3639426 8.473409 -----------------------------+------------------------------------------------ lab_dil: Identity | var(_cons) | .6135833 .3483724 .2016458 1.867058 -----------------------------+------------------------------------------------ var(Residual) | .1613334 .0589106 .0788696 .3300191 ------------------------------------------------------------------------------ LR test vs. linear regression: chi2(2) = 36.93 Prob > chi2 = 0.0000 Note: LR test is conservative and provided only for reference. . display 2*sqrt(2)*sqrt(.1613334) 1.1360753 . display 2*sqrt(2)*sqrt(1.756085+.6135833+.1613334) 4.4997793 . predict pred, fitted . predict stdres, rstandard . scatter stdres pred . qnorm stdres . predict pred_ref1 pred_ref2, reffects . bysort lab: gen lab_n=_n . qnorm pred_ref1 if lab_n==1 . swilk pred_ref1 if lab_n==1 Shapiro-Wilk W test for normal data Variable | Obs W V z Prob>z -------------+-------------------------------------------------- pred_ref1 | 5 0.91709 0.979 -0.028 0.51137 . bysort lab dilu: gen labdilu_n=_n . qnorm pred_ref2 if labdilu_n==1 . swilk pred_ref2 if labdilu_n==1 Shapiro-Wilk W test for normal data Variable | Obs W V z Prob>z -------------+-------------------------------------------------- pred_ref2 | 15 0.93024 1.353 0.597 0.27511 . . * Pig breeding example . import delimited add9ex4_1.csv, clear (3 vars, 20 obs) . anova wgain sire / dam|sire /* note specification */ Number of obs = 20 R-squared = 0.6315 Root MSE = .196723 Adj R-squared = 0.2999 Source | Partial SS df MS F Prob > F -----------+---------------------------------------------------- Model | .66327996 9 .073697773 1.90 0.1649 | sire | .099730047 4 .024932512 0.22 0.9155 dam|sire | .563549913 5 .112709983 -----------+---------------------------------------------------- | Residual | .387000025 10 .038700002 -----------+---------------------------------------------------- Total | 1.05027999 19 .055277894 . generate damid=sire*10+dam . mixed wgain i.sire || damid:, reml Performing EM optimization: Performing gradient-based optimization: Iteration 0: log restricted-likelihood = -3.0333619 Iteration 1: log restricted-likelihood = -3.0328927 Iteration 2: log restricted-likelihood = -3.0328926 Computing standard errors: Mixed-effects REML regression Number of obs = 20 Group variable: damid Number of groups = 10 Obs per group: min = 2 avg = 2.0 max = 2 Wald chi2(4) = 0.88 Log restricted-likelihood = -3.0328926 Prob > chi2 = 0.9267 ------------------------------------------------------------------------------ wgain | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- sire | 2 | -.1375 .2373921 -0.58 0.562 -.6027799 .3277798 3 | -.035 .2373921 -0.15 0.883 -.5002799 .4302799 4 | -.1975001 .2373921 -0.83 0.405 -.6627799 .2677798 5 | -.0975 .2373921 -0.41 0.681 -.5627799 .3677799 | _cons | 2.6675 .1678615 15.89 0.000 2.338497 2.996503 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+------------------------------------------------ damid: Identity | var(_cons) | .037005 .0366775 .0053039 .2581836 -----------------------------+------------------------------------------------ var(Residual) | .0387 .0173072 .016108 .0929779 ------------------------------------------------------------------------------ LR test vs. linear regression: chibar2(01) = 2.05 Prob >= chibar2 = 0.0760 . predict pred, fitted . predict stdres, rstandard . scatter stdres pred . qnorm stdres . predict pred_ref, reffects . bysort damid: gen within_n=_n . qnorm pred_ref if within_n==1 . swilk pred_ref if within_n==1 Shapiro-Wilk W test for normal data Variable | Obs W V z Prob>z -------------+-------------------------------------------------- pred_ref | 10 0.96150 0.593 -0.852 0.80287 .