-------------------------------------------------------------------------------- name: log: c:\vhm812-data\l3a_model_build_II.txt log type: text opened on: 26 Jan 2018, 12:55:42 . set more off . . * open the DAISY Red dataset . use daisy2red.dta, clear . gen month=month(calv_dt) . gen aut_clv=(month>=2 & month<=7) if !missing(calv_dt) . gen hs_ct=herd_size-251 . gen hs_sq=herd_size^2 . gen parity1=parity-1 . gen milk120k=milk120/1000 (38 missing values generated) . gen wpc_sqrt=sqrt(wpc) . save daisy2red1.dta, replace file daisy2red1.dta saved . . * specifying maximum model . * no analyses . . * causal model . * no analyses . . * Reducing the Number of Predictors . * descriptive statistics . codebook cf vag_disch -------------------------------------------------------------------------------- cf Calving to first service interval -------------------------------------------------------------------------------- type: numeric (float) range: [21,240] units: 1 unique values: 129 missing .: 14/1,574 mean: 71.3115 std. dev: 21.9039 percentiles: 10% 25% 50% 75% 90% 51 59 67 78 96 -------------------------------------------------------------------------------- vag_disch Vaginal discharge observed -------------------------------------------------------------------------------- type: numeric (float) label: noyes range: [0,1] units: 1 unique values: 2 missing .: 0/1,574 tabulation: Freq. Numeric Label 1,492 0 no 82 1 yes . sum milk120 parity herd_size dyst rp vag_disch, detail Milk volume (l) in first 120 days of lactation ------------------------------------------------------------- Percentiles Smallest 1% 1698.5 1110.2 5% 2077.2 1397.7 10% 2298.2 1461.5 Obs 1,536 25% 2731.95 1467 Sum of Wgt. 1,536 50% 3215.25 Mean 3215.096 Largest Std. Dev. 698.1316 75% 3682.1 5181.8 90% 4080.7 5278 Variance 487387.7 95% 4403.3 5399.9 Skewness .1101838 99% 4904.4 5630.3 Kurtosis 2.845637 Lactation number ------------------------------------------------------------- Percentiles Smallest 1% 1 1 5% 1 1 10% 1 1 Obs 1,574 25% 1 1 Sum of Wgt. 1,574 50% 2 Mean 2.729987 Largest Std. Dev. 1.493841 75% 4 7 90% 5 7 Variance 2.23156 95% 5 7 Skewness .5450922 99% 6 7 Kurtosis 2.315593 Herd size ------------------------------------------------------------- Percentiles Smallest 1% 125 125 5% 125 125 10% 185 125 Obs 1,574 25% 201 125 Sum of Wgt. 1,574 50% 263 Mean 251.0076 Largest Std. Dev. 62.01692 75% 294 333 90% 333 333 Variance 3846.098 95% 333 333 Skewness -.3550929 99% 333 333 Kurtosis 2.256969 Dystocia at calving ------------------------------------------------------------- Percentiles Smallest 1% 0 0 5% 0 0 10% 0 0 Obs 1,574 25% 0 0 Sum of Wgt. 1,574 50% 0 Mean .0597205 Largest Std. Dev. .2370435 75% 0 1 90% 0 1 Variance .0561896 95% 1 1 Skewness 3.715938 99% 1 1 Kurtosis 14.80819 Retained placenta at calving ------------------------------------------------------------- Percentiles Smallest 1% 0 0 5% 0 0 10% 0 0 Obs 1,574 25% 0 0 Sum of Wgt. 1,574 50% 0 Mean .0946633 Largest Std. Dev. .2928423 75% 0 1 90% 0 1 Variance .0857566 95% 1 1 Skewness 2.769173 99% 1 1 Kurtosis 8.66832 Vaginal discharge observed ------------------------------------------------------------- Percentiles Smallest 1% 0 0 5% 0 0 10% 0 0 Obs 1,574 25% 0 0 Sum of Wgt. 1,574 50% 0 Mean .0520966 Largest Std. Dev. .2222924 75% 0 1 90% 0 1 Variance .0494139 95% 1 1 Skewness 4.031139 99% 1 1 Kurtosis 17.25008 . . * correlation . corr milk120 parity herd_size (obs=1,536) | milk120 parity herd_s~e -------------+--------------------------- milk120 | 1.0000 parity | 0.3821 1.0000 herd_size | -0.0433 0.0356 1.0000 . pwcorr milk120 parity herd_size, obs star(0.05) | milk120 parity herd_s~e -------------+--------------------------- milk120 | 1.0000 | 1536 | parity | 0.3821* 1.0000 | 1536 1574 | herd_size | -0.0433 0.0386 1.0000 | 1536 1574 1574 | . . * indices . * no analyses . . * unconditional associations . reg cf parity Source | SS df MS Number of obs = 1,560 -------------+---------------------------------- F(1, 1558) = 1.46 Model | 699.698494 1 699.698494 Prob > F = 0.2273 Residual | 747280.894 1,558 479.641139 R-squared = 0.0009 -------------+---------------------------------- Adj R-squared = 0.0003 Total | 747980.592 1,559 479.782291 Root MSE = 21.901 ------------------------------------------------------------------------------ cf | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- parity | -.4480144 .3709323 -1.21 0.227 -1.175594 .2795649 _cons | 72.53554 1.155186 62.79 0.000 70.26965 74.80142 ------------------------------------------------------------------------------ . reg cf i.vag_disch Source | SS df MS Number of obs = 1,560 -------------+---------------------------------- F(1, 1558) = 0.49 Model | 236.166881 1 236.166881 Prob > F = 0.4831 Residual | 747744.425 1,558 479.938656 R-squared = 0.0003 -------------+---------------------------------- Adj R-squared = -0.0003 Total | 747980.592 1,559 479.782291 Root MSE = 21.908 ------------------------------------------------------------------------------ cf | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- vag_disch | yes | 1.743523 2.485484 0.70 0.483 -3.131724 6.61877 _cons | 71.21989 .5698436 124.98 0.000 70.10215 72.33763 ------------------------------------------------------------------------------ . . * principle components / factor analysis / correspondence analysis . * not covered . . * Functional Form of predictors . * residual plots . reg cf milk120 Source | SS df MS Number of obs = 1,525 -------------+---------------------------------- F(1, 1523) = 0.72 Model | 329.282002 1 329.282002 Prob > F = 0.3977 Residual | 700869.964 1,523 460.19039 R-squared = 0.0005 -------------+---------------------------------- Adj R-squared = -0.0002 Total | 701199.246 1,524 460.104492 Root MSE = 21.452 ------------------------------------------------------------------------------ cf | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- milk120 | .0006667 .0007881 0.85 0.398 -.0008793 .0022126 _cons | 68.95612 2.595207 26.57 0.000 63.86556 74.04667 ------------------------------------------------------------------------------ . predict stdres, rsta (49 missing values generated) . . * lowess smoother . twoway (scatter stdres milk120) (lowess stdres milk120) . . * Detecting and Correcting for non-linearity (transformation of X) . * categorization of predictor . reg cf parity Source | SS df MS Number of obs = 1,560 -------------+---------------------------------- F(1, 1558) = 1.46 Model | 699.698494 1 699.698494 Prob > F = 0.2273 Residual | 747280.894 1,558 479.641139 R-squared = 0.0009 -------------+---------------------------------- Adj R-squared = 0.0003 Total | 747980.592 1,559 479.782291 Root MSE = 21.901 ------------------------------------------------------------------------------ cf | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- parity | -.4480144 .3709323 -1.21 0.227 -1.175594 .2795649 _cons | 72.53554 1.155186 62.79 0.000 70.26965 74.80142 ------------------------------------------------------------------------------ . egen parity_c6=cut(parity), at(0 1 2 3 4 5 6 15) icodes . tab parity parity_c6 Lactation | parity_c6 number | 1 2 3 4 5 | Total -----------+-------------------------------------------------------+---------- 1 | 417 0 0 0 0 | 417 2 | 0 374 0 0 0 | 374 3 | 0 0 319 0 0 | 319 4 | 0 0 0 222 0 | 222 5 | 0 0 0 0 169 | 169 6 | 0 0 0 0 0 | 69 7 | 0 0 0 0 0 | 4 -----------+-------------------------------------------------------+---------- Total | 417 374 319 222 169 | 1,574 Lactation | parity_c6 number | 6 | Total -----------+-----------+---------- 1 | 0 | 417 2 | 0 | 374 3 | 0 | 319 4 | 0 | 222 5 | 0 | 169 6 | 69 | 69 7 | 4 | 4 -----------+-----------+---------- Total | 73 | 1,574 . reg cf i.parity_c6 Source | SS df MS Number of obs = 1,560 -------------+---------------------------------- F(5, 1554) = 0.91 Model | 2174.44893 5 434.889785 Prob > F = 0.4761 Residual | 745806.143 1,554 479.926733 R-squared = 0.0029 -------------+---------------------------------- Adj R-squared = -0.0003 Total | 747980.592 1,559 479.782291 Root MSE = 21.907 ------------------------------------------------------------------------------ cf | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- parity_c6 | 2 | -.6468817 1.568168 -0.41 0.680 -3.722829 2.429066 3 | -3.181369 1.635853 -1.94 0.052 -6.390081 .027343 4 | -1.528531 1.831255 -0.83 0.404 -5.120523 2.063462 5 | -2.322236 2.004684 -1.16 0.247 -6.254406 1.609935 6 | -.9349232 2.781437 -0.34 0.737 -6.390688 4.520841 | _cons | 72.61985 1.077984 67.37 0.000 70.5054 74.73431 ------------------------------------------------------------------------------ . di "F-statistic = " ((747280.9-745806.1)/(1558-1554))/480 F-statistic = .768125 . di "P-value = " Ftail(4, 1554, ((747280.9-745806.1)/(1558-1554))/480) // R mod > el is valid but term is NS P-value = .54594195 . . . *box cox did not work with raw need to use log(cf) . gen ln_cf=ln(cf) (14 missing values generated) . boxcox ln_cf milk120, model(rhs) Fitting full model Iteration 0: log likelihood = -183.07949 (not concave) Iteration 1: log likelihood = -180.48708 (not concave) Iteration 2: log likelihood = -180.09968 Iteration 3: log likelihood = -180.01808 Iteration 4: log likelihood = -180.01807 (not concave) Iteration 5: log likelihood = -180.01807 Iteration 6: log likelihood = -180.01806 Iteration 7: log likelihood = -180.01804 Iteration 8: log likelihood = -180.01803 (38 missing values generated) (38 missing values generated) (38 missing values generated) Number of obs = 1,525 LR chi2(2) = 7.91 Log likelihood = -180.01803 Prob > chi2 = 0.019 ------------------------------------------------------------------------------ ln_cf | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- /lambda | -2.997824 .0059421 -504.50 0.000 -3.009471 -2.986178 ------------------------------------------------------------------------------ Estimates of scale-variant parameters ---------------------------- | Coef. -------------+-------------- Notrans | _cons | -4.74e+08 -------------+-------------- Trans | milk120 | 1.42e+09 -------------+-------------- /sigma | .2722883 ---------------------------- --------------------------------------------------------- Test Restricted LR statistic P-value H0: log likelihood chi2 Prob > chi2 --------------------------------------------------------- lambda = -1 -181.5923 3.15 0.076 lambda = 0 -182.42186 4.81 0.028 lambda = 1 -183.07949 6.12 0.013 --------------------------------------------------------- . . * quadratic function of X . reg ln_cf c.milk120k##c.milk120k Source | SS df MS Number of obs = 1,525 -------------+---------------------------------- F(2, 1522) = 4.78 Model | .709133443 2 .354566722 Prob > F = 0.0085 Residual | 112.944101 1,522 .074207688 R-squared = 0.0062 -------------+---------------------------------- Adj R-squared = 0.0049 Total | 113.653234 1,524 .074575613 Root MSE = .27241 ------------------------------------------------------------------------------ ln_cf | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- milk120k | .2056932 .0697547 2.95 0.003 .0688677 .3425186 | c.milk120k#| c.milk120k | -.0295122 .0105961 -2.79 0.005 -.0502966 -.0087277 | _cons | 3.883482 .1122207 34.61 0.000 3.663358 4.103605 ------------------------------------------------------------------------------ . estat vif Variable | VIF 1/VIF -------------+---------------------- milk120k | 48.58 0.020585 c.milk120k#| c.milk120k | 48.58 0.020585 -------------+---------------------- Mean VIF | 48.58 . vce, corr Correlation matrix of coefficients of regress model | c.m~120k# e(V) | milk120k c.m~120k _cons -------------+------------------------------ milk120k | 1.0000 c.milk120k#| c.milk120k | -0.9897 1.0000 _cons | -0.9872 0.9559 1.0000 . . * redoing the analysis with milk120 centred . summ milk120k Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- milk120k | 1,536 3.215096 .6981316 1.1102 5.6303 . gen m120k_ct=(milk120k-r(mean)) /*r(mean) - memory variable created by summ co > mmand that store the mean of the variable*/ (38 missing values generated) . reg ln_cf c.m120k_ct##c.m120k_ct Source | SS df MS Number of obs = 1,525 -------------+---------------------------------- F(2, 1522) = 4.78 Model | .709133424 2 .354566712 Prob > F = 0.0085 Residual | 112.944101 1,522 .074207688 R-squared = 0.0062 -------------+---------------------------------- Adj R-squared = 0.0049 Total | 113.653234 1,524 .074575613 Root MSE = .27241 ------------------------------------------------------------------------------ ln_cf | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- m120k_ct | .0159242 .0100484 1.58 0.113 -.0037859 .0356344 | c.m120k_ct#| c.m120k_ct | -.0295122 .0105961 -2.79 0.005 -.0502966 -.0087277 | _cons | 4.239742 .0086679 489.13 0.000 4.22274 4.256745 ------------------------------------------------------------------------------ . estat vif Variable | VIF 1/VIF -------------+---------------------- m120k_ct | 1.01 0.992010 c.m120k_ct#| c.m120k_ct | 1.01 0.992010 -------------+---------------------- Mean VIF | 1.01 . vce, corr Correlation matrix of coefficients of regress model | c.m120~t# e(V) | m120k_ct c.m120~t _cons -------------+------------------------------ m120k_ct | 1.0000 c.m120k_ct#| c.m120k_ct | -0.0894 1.0000 _cons | 0.0494 -0.5936 1.0000 . . capture drop stdres . predict stdres, rsta (49 missing values generated) . twoway (scatter stdres m120k_ct) (lowess stdres m120k_ct) . . * fractional polynomials . fp , scale center replace: reg ln_cf (fitting 44 models) (....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%) Fractional polynomial comparisons: ------------------------------------------------------------------------------- milk120k | df Deviance Res. s.d. Dev. dif. P(*) Powers -------------+----------------------------------------------------------------- omitted | 0 367.951 0.273 10.533 0.033 linear | 1 366.159 0.273 8.741 0.033 1 m = 1 | 2 361.396 0.273 3.978 0.138 -2 m = 2 | 4 357.418 0.272 0.000 -- -2 3 ------------------------------------------------------------------------------- (*) P = sig. level of model with m = 2 based on F with 1520 denominator dof. Source | SS df MS Number of obs = 1,525 -------------+---------------------------------- F(2, 1522) = 5.27 Model | .782289941 2 .391144971 Prob > F = 0.0052 Residual | 112.870944 1,522 .074159622 R-squared = 0.0069 -------------+---------------------------------- Adj R-squared = 0.0056 Total | 113.653234 1,524 .074575613 Root MSE = .27232 ------------------------------------------------------------------------------ ln_cf | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- milk120k_1 | -.5301266 .1654544 -3.20 0.001 -.8546694 -.2055839 milk120k_2 | -.0008516 .0004271 -1.99 0.046 -.0016894 -.0000138 _cons | 4.238434 .008295 510.96 0.000 4.222163 4.254704 ------------------------------------------------------------------------------ . fp plot, r(none) ytitle(Predicted Ln(cf)) . fp plot, r(residuals) ytitle(Predicted Ln(cf)) /*plot with residuals*/ . . *more predictors . fp , scale center replace: reg ln_cf parity rp (fitting 44 models) (....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%) Fractional polynomial comparisons: ------------------------------------------------------------------------------- milk120k | df Deviance Res. s.d. Dev. dif. P(*) Powers -------------+----------------------------------------------------------------- omitted | 0 361.366 0.273 12.986 0.012 linear | 1 357.527 0.272 9.147 0.028 1 m = 1 | 2 351.183 0.272 2.803 0.248 -2 m = 2 | 4 348.380 0.272 0.000 -- -2 3 ------------------------------------------------------------------------------- (*) P = sig. level of model with m = 2 based on F with 1518 denominator dof. Source | SS df MS Number of obs = 1,525 -------------+---------------------------------- F(4, 1520) = 4.91 Model | 1.44923978 4 .362309946 Prob > F = 0.0006 Residual | 112.203994 1,520 .073818417 R-squared = 0.0128 -------------+---------------------------------- Adj R-squared = 0.0102 Total | 113.653234 1,524 .074575613 Root MSE = .2717 ------------------------------------------------------------------------------ ln_cf | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- milk120k_1 | -.5748179 .1665152 -3.45 0.001 -.9014417 -.2481941 milk120k_2 | -.0007211 .0004312 -1.67 0.095 -.001567 .0001247 parity | -.0097322 .005001 -1.95 0.052 -.0195417 .0000773 rp | .053428 .0236505 2.26 0.024 .007037 .099819 _cons | 4.260094 .0162315 262.46 0.000 4.228255 4.291932 ------------------------------------------------------------------------------ . fp , scale center replace: reg ln_cf milk120k_1 milk120k_2 r > p (fitting 44 models) (....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%) Fractional polynomial comparisons: ------------------------------------------------------------------------------- parity | df Deviance Res. s.d. Dev. dif. P(*) Powers -------------+----------------------------------------------------------------- omitted | 0 352.175 0.272 11.021 0.027 linear | 1 348.380 0.272 7.227 0.066 1 m = 1 | 2 344.414 0.271 3.261 0.198 -2 m = 2 | 4 341.154 0.271 0.000 -- 2 2 ------------------------------------------------------------------------------- (*) P = sig. level of model with m = 2 based on F with 1517 denominator dof. Source | SS df MS Number of obs = 1,525 -------------+---------------------------------- F(5, 1519) = 5.39 Model | 1.97968732 5 .395937465 Prob > F = 0.0001 Residual | 111.673547 1,519 .073517806 R-squared = 0.0174 -------------+---------------------------------- Adj R-squared = 0.0142 Total | 113.653234 1,524 .074575613 Root MSE = .27114 ------------------------------------------------------------------------------ ln_cf | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- milk120k_1 | -.609147 .1666002 -3.66 0.000 -.9359378 -.2823563 milk120k_2 | -.0005881 .0004331 -1.36 0.175 -.0014377 .0002615 parity_1 | -.020564 .0064416 -3.19 0.001 -.0331993 -.0079287 parity_2 | .010903 .0035577 3.06 0.002 .0039244 .0178816 rp | .0572608 .0236445 2.42 0.016 .0108815 .1036401 _cons | 4.216664 .0107005 394.06 0.000 4.195674 4.237653 ------------------------------------------------------------------------------ . fp plot, r(none) ytitle(Predicted Ln(cf)) . fp plot, r(residuals) ytitle(Predicted Ln(cf)) /*plot with residuals*/ . . *or you can predict fitted values and standardized residuals and plot them . capture drop fit1 . capture drop stdres . predict fit1 (option xb assumed; fitted values) (49 missing values generated) . predict stdres, rstand (49 missing values generated) . twoway scatter ln_cf milk120k || line fit1 milk120k, sort . twoway (scatter stdres milk120k) (lowess stdres milk120k) . . * interactions . * 2-way . reg wpc_sqrt c.hs_ct##c.hs_ct i.aut_clv i.twin i.dyst##i.vag_disch Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(7, 1566) = 19.94 Model | 1089.06861 7 155.581229 Prob > F = 0.0000 Residual | 12217.1982 1,566 7.80153144 R-squared = 0.0818 -------------+---------------------------------- Adj R-squared = 0.0777 Total | 13306.2668 1,573 8.45916519 Root MSE = 2.7931 ------------------------------------------------------------------------------ wpc_sqrt | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- hs_ct | .0124973 .0012109 10.32 0.000 .0101222 .0148724 | c.hs_ct#| c.hs_ct | .0000729 .0000174 4.20 0.000 .0000389 .0001069 | 1.aut_clv | -.5220441 .1417584 -3.68 0.000 -.8001004 -.2439877 | twin | yes | 1.533154 .5476246 2.80 0.005 .4589989 2.607308 | dyst | yes | .7636387 .3243731 2.35 0.019 .1273874 1.39989 | vag_disch | yes | .979344 .3502916 2.80 0.005 .2922541 1.666434 | dyst#| vag_disch | yes#yes | -2.272293 .8833614 -2.57 0.010 -4.004989 -.5395972 | _cons | 7.631652 .1213388 62.90 0.000 7.393648 7.869655 ------------------------------------------------------------------------------ . . * Selection criteria . * non-statistical . * no analyses . . * statistical - nested models . * Wald test . reg wpc_sqrt c.hs_ct##c.hs_ct i.aut_clv i.twin i.dyst##i.vag_disch Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(7, 1566) = 19.94 Model | 1089.06861 7 155.581229 Prob > F = 0.0000 Residual | 12217.1982 1,566 7.80153144 R-squared = 0.0818 -------------+---------------------------------- Adj R-squared = 0.0777 Total | 13306.2668 1,573 8.45916519 Root MSE = 2.7931 ------------------------------------------------------------------------------ wpc_sqrt | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- hs_ct | .0124973 .0012109 10.32 0.000 .0101222 .0148724 | c.hs_ct#| c.hs_ct | .0000729 .0000174 4.20 0.000 .0000389 .0001069 | 1.aut_clv | -.5220441 .1417584 -3.68 0.000 -.8001004 -.2439877 | twin | yes | 1.533154 .5476246 2.80 0.005 .4589989 2.607308 | dyst | yes | .7636387 .3243731 2.35 0.019 .1273874 1.39989 | vag_disch | yes | .979344 .3502916 2.80 0.005 .2922541 1.666434 | dyst#| vag_disch | yes#yes | -2.272293 .8833614 -2.57 0.010 -4.004989 -.5395972 | _cons | 7.631652 .1213388 62.90 0.000 7.393648 7.869655 ------------------------------------------------------------------------------ . testparm i.dyst#i.vag_disch ( 1) 1.dyst#1.vag_disch = 0 F( 1, 1566) = 6.62 Prob > F = 0.0102 . . * AIC and BIC . * statistical non-nested - this in logistic regression . . *Selection strategy - Coleman dataset . * best subset - install command vselect . use coleman.dta, clear . rename x1 x1_staff_sal . rename x2 x2_father_job . rename x3 x3_ses . rename x4 x4_test_teach . rename x5 x5_edu_mother . rename y y_test_scr . . vselect y_test_scr x1_staff_sal x2_father_job x3_ses /// > x4_test_teach x5_edu_mother, best Response : y_test_scr Selected predictors: x3_ses x4_test_teach x1_staff_sal x5_edu_mother x2_father > _job Optimal models: # Preds R2ADJ C AIC AICC BIC 1 .8518292 4.974883 90.89429 92.39429 92.88576 2 .8740953 2.832759 88.49432 91.16099 91.48152 3 .8820943 2.836089 87.96903 92.25475 91.95196 4 .8756399 4.670221 89.74417 96.20571 94.72284 5 .8728444 6 90.80893 100.1423 96.78332 predictors for each model: 1 : x3_ses 2 : x3_ses x4_test_teach 3 : x3_ses x4_test_teach x1_staff_sal 4 : x3_ses x4_test_teach x1_staff_sal x5_edu_mother 5 : x3_ses x4_test_teach x1_staff_sal x5_edu_mother x2_father_job . . * forward selection . stepwise, pe(0.1): reg y_test_scr x1_staff_sal x2_father_job x3_ses x4_test_te > ach x5_edu_mother begin with empty model p = 0.0000 < 0.1000 adding x3_ses p = 0.0566 < 0.1000 adding x4_test_teach Source | SS df MS Number of obs = 20 -------------+---------------------------------- F(2, 17) = 66.95 Model | 570.497872 2 285.248936 Prob > F = 0.0000 Residual | 72.4264222 17 4.26037777 R-squared = 0.8873 -------------+---------------------------------- Adj R-squared = 0.8741 Total | 642.924294 19 33.8381207 Root MSE = 2.0641 ------------------------------------------------------------------------------- y_test_scr | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------------+---------------------------------------------------------------- x3_ses | .5415607 .0500441 10.82 0.000 .4359768 .6471445 x4_test_teach | .7498915 .3666402 2.05 0.057 -.0236517 1.523435 _cons | 14.5827 9.175409 1.59 0.130 -4.775723 33.94112 ------------------------------------------------------------------------------- . * backward elimination . stepwise, pr(0.1): reg y_test_scr x1_staff_sal x2_father_job x3_ses x4_test_te > ach x5_edu_mother begin with full model p = 0.4267 >= 0.1000 removing x2_father_job p = 0.6863 >= 0.1000 removing x5_edu_mother p = 0.1616 >= 0.1000 removing x1_staff_sal Source | SS df MS Number of obs = 20 -------------+---------------------------------- F(2, 17) = 66.95 Model | 570.497872 2 285.248936 Prob > F = 0.0000 Residual | 72.4264222 17 4.26037777 R-squared = 0.8873 -------------+---------------------------------- Adj R-squared = 0.8741 Total | 642.924294 19 33.8381207 Root MSE = 2.0641 ------------------------------------------------------------------------------- y_test_scr | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------------+---------------------------------------------------------------- x3_ses | .5415607 .0500441 10.82 0.000 .4359768 .6471445 x4_test_teach | .7498915 .3666402 2.05 0.057 -.0236517 1.523435 _cons | 14.5827 9.175409 1.59 0.130 -4.775723 33.94112 ------------------------------------------------------------------------------- . * stepwise selection . stepwise, pe(0.1) pr(0.11): reg y_test_scr x1_staff_sal x2_father_job x3_ses x > 4_test_teach x5_edu_mother begin with full model p = 0.4267 >= 0.1100 removing x2_father_job p = 0.6863 >= 0.1100 removing x5_edu_mother p = 0.1616 >= 0.1100 removing x1_staff_sal Source | SS df MS Number of obs = 20 -------------+---------------------------------- F(2, 17) = 66.95 Model | 570.497872 2 285.248936 Prob > F = 0.0000 Residual | 72.4264222 17 4.26037777 R-squared = 0.8873 -------------+---------------------------------- Adj R-squared = 0.8741 Total | 642.924294 19 33.8381207 Root MSE = 2.0641 ------------------------------------------------------------------------------- y_test_scr | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------------+---------------------------------------------------------------- x3_ses | .5415607 .0500441 10.82 0.000 .4359768 .6471445 x4_test_teach | .7498915 .3666402 2.05 0.057 -.0236517 1.523435 _cons | 14.5827 9.175409 1.59 0.130 -4.775723 33.94112 ------------------------------------------------------------------------------- . . *selection strategy - daisy2red . use daisy2red1, clear . *create interaction terms for stepwise command . gen twdy=twin*dyst . gen rpvd=rp*vag_disch . . reg wpc_sqrt parity1 hs_ct hs_sq i.aut_clv i.twin i.dyst i.twdy i.rp i.vag > _disch i.rpvd Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(10, 1563) = 14.84 Model | 1153.69711 10 115.369711 Prob > F = 0.0000 Residual | 12152.5697 1,563 7.77515658 R-squared = 0.0867 -------------+---------------------------------- Adj R-squared = 0.0809 Total | 13306.2668 1,573 8.45916519 Root MSE = 2.7884 ------------------------------------------------------------------------------ wpc_sqrt | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- parity1 | .0558659 .0480009 1.16 0.245 -.0382871 .150019 hs_ct | -.0234668 .0084135 -2.79 0.005 -.0399698 -.0069638 hs_sq | .0000713 .0000174 4.10 0.000 .0000372 .0001054 1.aut_clv | -.5128308 .1418525 -3.62 0.000 -.7910719 -.2345896 | twin | yes | 1.698238 .5842331 2.91 0.004 .552275 2.844201 | dyst | yes | .627718 .3100054 2.02 0.043 .0196477 1.235788 1.twdy | -2.772795 1.739103 -1.59 0.111 -6.184016 .6384255 | rp | yes | .3904235 .2689735 1.45 0.147 -.1371634 .9180105 | vag_disch | yes | -.0422026 .4007079 -0.11 0.916 -.8281844 .7437792 1.rpvd | 1.476058 .6996667 2.11 0.035 .1036735 2.848442 _cons | 3.023021 1.168247 2.59 0.010 .7315249 5.314516 ------------------------------------------------------------------------------ . . *stepwise backward . stepwise, pe(0.05) pr(0.051) lockterm1: reg wpc_sqrt parity1 (hs_ct hs_sq) au > t_clv /// > (dyst twin twdy) (rp vag_disch rpvd) begin with full model p < 0.0510 for all terms in model Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(10, 1563) = 14.84 Model | 1153.69711 10 115.369711 Prob > F = 0.0000 Residual | 12152.5697 1,563 7.77515658 R-squared = 0.0867 -------------+---------------------------------- Adj R-squared = 0.0809 Total | 13306.2668 1,573 8.45916519 Root MSE = 2.7884 ------------------------------------------------------------------------------ wpc_sqrt | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- parity1 | .0558659 .0480009 1.16 0.245 -.0382871 .150019 hs_ct | -.0234668 .0084135 -2.79 0.005 -.0399698 -.0069638 hs_sq | .0000713 .0000174 4.10 0.000 .0000372 .0001054 aut_clv | -.5128308 .1418525 -3.62 0.000 -.7910719 -.2345896 dyst | .627718 .3100054 2.02 0.043 .0196477 1.235788 twin | 1.698238 .5842331 2.91 0.004 .552275 2.844201 twdy | -2.772795 1.739103 -1.59 0.111 -6.184016 .6384255 rp | .3904235 .2689735 1.45 0.147 -.1371634 .9180105 vag_disch | -.0422026 .4007079 -0.11 0.916 -.8281844 .7437792 rpvd | 1.476058 .6996667 2.11 0.035 .1036735 2.848442 _cons | 3.023021 1.168247 2.59 0.010 .7315249 5.314516 ------------------------------------------------------------------------------ . estimates store sw_1 . . stepwise, pe(0.05) pr(0.051) lockterm1: reg wpc_sqrt parity1 (hs_ct hs_sq) au > t_clv /// > dyst twin twdy rp vag_disch rpvd begin with full model p = 0.9161 >= 0.0510 removing vag_disch p = 0.1434 >= 0.0510 removing rp p = 0.1136 >= 0.0510 removing twdy p = 0.0616 >= 0.0510 removing dyst Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(6, 1567) = 23.31 Model | 1090.25318 6 181.708864 Prob > F = 0.0000 Residual | 12216.0137 1,567 7.79579685 R-squared = 0.0819 -------------+---------------------------------- Adj R-squared = 0.0784 Total | 13306.2668 1,573 8.45916519 Root MSE = 2.7921 ------------------------------------------------------------------------------ wpc_sqrt | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- parity1 | .0438808 .0474418 0.92 0.355 -.0491753 .1369369 hs_ct | -.0216107 .0083743 -2.58 0.010 -.0380366 -.0051847 hs_sq | .0000669 .0000173 3.87 0.000 .000033 .0001009 aut_clv | -.5244014 .1418074 -3.70 0.000 -.8025536 -.2462491 rpvd | 1.831729 .5194199 3.53 0.000 .8128976 2.85056 twin | 1.478791 .5486927 2.70 0.007 .4025419 2.55504 _cons | 3.404817 1.155405 2.95 0.003 1.138515 5.67112 ------------------------------------------------------------------------------ . estimates store sw_2 . estimates table sw_1 sw_2, star(0.10 0.05 0.001) ---------------------------------------------- Variable | sw_1 sw_2 -------------+-------------------------------- parity1 | .05586593 .04388077 hs_ct | -.02346682** -.02161068** hs_sq | .0000713*** .00006694*** aut_clv | -.51283075*** -.52440137*** dyst | .62771805** twin | 1.6982381** 1.4787911** twdy | -2.7727951 rp | .39042354 vag_disch | -.04220258 rpvd | 1.4760578** 1.8317287*** _cons | 3.0230207** 3.4048174** ---------------------------------------------- legend: * p<.1; ** p<.05; *** p<.001 . . * reliability . **leave-one-out (advanced commands) . ***computin "n" regression models without obs "i" using residual and leverage > values . ***and then calculating PRESS (predicted residual sum of squares) . . use daisy2red1, clear . reg wpc_sqrt c.hs_ct##c.hs_ct parity1 i.aut_clv i.twin i.dyst##i.vag_disch Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(8, 1565) = 17.67 Model | 1102.29212 8 137.786515 Prob > F = 0.0000 Residual | 12203.9747 1,565 7.79806691 R-squared = 0.0828 -------------+---------------------------------- Adj R-squared = 0.0782 Total | 13306.2668 1,573 8.45916519 Root MSE = 2.7925 ------------------------------------------------------------------------------ wpc_sqrt | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- hs_ct | .0124288 .0012117 10.26 0.000 .010052 .0148056 | c.hs_ct#| c.hs_ct | .0000717 .0000174 4.13 0.000 .0000377 .0001058 | parity1 | .0624408 .04795 1.30 0.193 -.0316122 .1564939 1.aut_clv | -.5313957 .1419088 -3.74 0.000 -.8097471 -.2530443 | twin | yes | 1.484421 .5487805 2.70 0.007 .407999 2.560844 | dyst | yes | .8159669 .3267812 2.50 0.013 .1749918 1.456942 | vag_disch | yes | .9922199 .3503534 2.83 0.005 .3050084 1.679431 | dyst#| vag_disch | yes#yes | -2.257598 .8832373 -2.56 0.011 -3.990051 -.5251447 | _cons | 7.529392 .1445102 52.10 0.000 7.245938 7.812846 ------------------------------------------------------------------------------ . capture drop res lev eq1 . predict res, res . predict lev ,lev . gen eq1=(res/(1-lev))^2 . summ eq1 Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- eq1 | 1,574 7.851331 10.9765 1.85e-07 83.92817 . di "PRESS = " r(sum) //error sum square from predicted points PRESS = 12357.995 . scalar PRESS = r(sum) . . di "RSS full= "e(rss) //residuals sum square from full model RSS full= 12203.975 . scalar TSS = (e(mss) + e(rss)) . . di "R2(pred) = "1-(PRESS/ TSS) R2(pred) = .07126506 . di "R2 full model = " e(r2) R2 full model = .08284007 . . * presenting results . * standardize coefficients . reg wpc_sqrt c.hs_ct##c.hs_ct parity1 i.aut_clv i.twin i.dyst##i.vag_disch Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(8, 1565) = 17.67 Model | 1102.29212 8 137.786515 Prob > F = 0.0000 Residual | 12203.9747 1,565 7.79806691 R-squared = 0.0828 -------------+---------------------------------- Adj R-squared = 0.0782 Total | 13306.2668 1,573 8.45916519 Root MSE = 2.7925 ------------------------------------------------------------------------------ wpc_sqrt | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- hs_ct | .0124288 .0012117 10.26 0.000 .010052 .0148056 | c.hs_ct#| c.hs_ct | .0000717 .0000174 4.13 0.000 .0000377 .0001058 | parity1 | .0624408 .04795 1.30 0.193 -.0316122 .1564939 1.aut_clv | -.5313957 .1419088 -3.74 0.000 -.8097471 -.2530443 | twin | yes | 1.484421 .5487805 2.70 0.007 .407999 2.560844 | dyst | yes | .8159669 .3267812 2.50 0.013 .1749918 1.456942 | vag_disch | yes | .9922199 .3503534 2.83 0.005 .3050084 1.679431 | dyst#| vag_disch | yes#yes | -2.257598 .8832373 -2.56 0.011 -3.990051 -.5251447 | _cons | 7.529392 .1445102 52.10 0.000 7.245938 7.812846 ------------------------------------------------------------------------------ . * rescale coefficient by SD of the predictor . matrix define coeff = e(b)' . capture drop coeff* . svmat coeff . summ hs_ct parity1 aut_clv twin dyst vag_disch Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- hs_ct | 1,574 .0076239 62.01692 -126 82 parity1 | 1,574 1.729987 1.493841 0 6 aut_clv | 1,574 .4720457 .4993766 0 1 twin | 1,574 .0171537 .1298854 0 1 dyst | 1,574 .0597205 .2370435 0 1 -------------+--------------------------------------------------------- vag_disch | 1,574 .0520966 .2222924 0 1 . . * IQR . reg wpc_sqrt c.hs_ct##c.hs_ct parity1 i.aut_clv i.twin i.dyst##i.vag_disch Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(8, 1565) = 17.67 Model | 1102.29212 8 137.786515 Prob > F = 0.0000 Residual | 12203.9747 1,565 7.79806691 R-squared = 0.0828 -------------+---------------------------------- Adj R-squared = 0.0782 Total | 13306.2668 1,573 8.45916519 Root MSE = 2.7925 ------------------------------------------------------------------------------ wpc_sqrt | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- hs_ct | .0124288 .0012117 10.26 0.000 .010052 .0148056 | c.hs_ct#| c.hs_ct | .0000717 .0000174 4.13 0.000 .0000377 .0001058 | parity1 | .0624408 .04795 1.30 0.193 -.0316122 .1564939 1.aut_clv | -.5313957 .1419088 -3.74 0.000 -.8097471 -.2530443 | twin | yes | 1.484421 .5487805 2.70 0.007 .407999 2.560844 | dyst | yes | .8159669 .3267812 2.50 0.013 .1749918 1.456942 | vag_disch | yes | .9922199 .3503534 2.83 0.005 .3050084 1.679431 | dyst#| vag_disch | yes#yes | -2.257598 .8832373 -2.56 0.011 -3.990051 -.5251447 | _cons | 7.529392 .1445102 52.10 0.000 7.245938 7.812846 ------------------------------------------------------------------------------ . summ parity1,d /* note IQR = 3 - 0 = 3 */ parity1 ------------------------------------------------------------- Percentiles Smallest 1% 0 0 5% 0 0 10% 0 0 Obs 1,574 25% 0 0 Sum of Wgt. 1,574 50% 1 Mean 1.729987 Largest Std. Dev. 1.493841 75% 3 6 90% 4 6 Variance 2.23156 95% 4 6 Skewness .5450922 99% 5 6 Kurtosis 2.315593 . /* coefficient for parity = 0.062 */ > . end of do-file