## Transcribed Text

1. Using data from a case-control study with 697 people, we want to model the probability of having lung cancer. The main independent variable is exposure to second-hand smoke. We also want to control for college degree (yes or no), type of work (blue collar or white collar), sex, and age. For the categorical variables, referent levels are no exposure, no college degree, blue collar, and female.
Two logistic regression models have been run. The first (full) model contains all five variables, and the second contains only exposure, sex, and age.
i) Using the first model, interpret the results for exposure, college, sex, and age. For each variable, use the CI for the odds ratio (whether or not it includes 1) and/or the p-value to assess “significance”. For results that are significant, interpret the odds ratio (as an estimated relative risk) to complete your interpretation.
ii) Perform a likelihood ratio test that all non-significant variable(s) can be removed from the full model. Show how you calculated the test statistic, state the rejection region, and interpret the result of the test.
iii) There should be one number that must be the same for both the full and reduced models in order for the likelihood ratio test to be valid. Give that number.
iv) Give a brief plausible explanation why the variable(s) cannot all be removed in the likelihood ratio test in part (ii). What could you look at to see if your explanation was correct?
2. The number of times a person needed to take pain medicine in the three days after surgery is recorded for 58 patients. We also have the type of surgery used (experimental or standard), as well as the sex and age of the patient.
i) Explain why Poisson regression, rather than a linear model, is the better approach here.
ii) For each variable in the model, give the p-value from the partial test, and interpret the result. Use the exponentiated least squares means for any “significant” categorical variable to help complete the interpretation.
iii) Calculate and interpret the rate ratios for surgery type and age.
iv) This model has no interaction terms, but describe what an interaction between sex and surgery type would mean in the context of this problem.
Hint: The dependent variable must be included as part of this description.
Printout for Problem #1:
Model #1:
The LOGISTIC Procedure
Probability modeled is LungCa='yes'.
Model Fit Statistics
Intercept
Intercept and
Criterion Only Covariates
AIC 967.729 888.673
SC 972.276 915.954
-2 Log L 965.729 876.673
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 89.0561 5 <.0001
Score 84.5596 5 <.0001
Wald 76.2226 5 <.0001
Type 3 Analysis of Effects
Wald
Effect DF Chi-Square Pr > ChiSq
exposed 1 38.8496 <.0001
college 1 1.5540 0.2125
work 1 0.0509 0.8215
sex 1 35.0073 <.0001
age 1 9.0510 0.0026
Analysis of Maximum Likelihood Estimates
Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 -0.5278 0.2768 3.6348 0.0566
exposed yes 1 1.0797 0.1732 38.8496 <.0001
college yes 1 -0.3130 0.2511 1.5540 0.2125
work white 1 -0.0600 0.2662 0.0509 0.8215
sex male 1 -0.9713 0.1642 35.0073 <.0001
age 1 0.0130 0.00434 9.0510 0.0026
Odds Ratio Estimates
Point 95% Wald
Effect Estimate Confidence Limits
exposed yes vs no 2.944 2.096 4.134
college yes vs no 0.731 0.447 1.196
work white vs blue 0.942 0.559 1.587
sex male vs female 0.379 0.274 0.522
age 1.013 1.005 1.022
Model #2:
The LOGISTIC Procedure
Probability modeled is LungCa='yes'.
Model Fit Statistics
Intercept
Intercept and
Criterion Only Covariates
AIC 967.729 888.641
SC 972.276 906.828
-2 Log L 965.729 880.641
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 85.0878 3 <.0001
Score 81.0451 3 <.0001
Wald 73.4314 3 <.0001
Type 3 Analysis of Effects
Wald
Effect DF Chi-Square Pr > ChiSq
exposed 1 39.1151 <.0001
sex 1 35.3411 <.0001
age 1 9.0999 0.0026
Analysis of Maximum Likelihood Estimates
Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 -0.6315 0.2714 5.4123 0.0200
exposed yes 1 1.0791 0.1725 39.1151 <.0001
sex male 1 -0.9735 0.1637 35.3411 <.0001
age 1 0.0130 0.00432 9.0999 0.0026
Odds Ratio Estimates
Point 95% Wald
Effect Estimate Confidence Limits
exposed yes vs no 2.942 2.098 4.126
sex male vs female 0.378 0.274 0.521
age 1.013 1.005 1.022
Printout for Problem #2:
The GENMOD Procedure
Class Level Information
Class Levels Values
surgery 2 exprmntl standard
sex 2 female male
Criteria For Assessing Goodness Of Fit
Criterion DF Value Value/DF
Deviance 54 48.7827 0.9034
Scaled Deviance 54 48.7827 0.9034
Pearson Chi-Square 54 45.0095 0.8335
Scaled Pearson X2 54 45.0095 0.8335
Log Likelihood 307.9718
Full Log Likelihood -127.9937
AIC (smaller is better) 263.9874
AICC (smaller is better) 264.7421
BIC (smaller is better) 272.2292
Analysis Of Maximum Likelihood Parameter Estimates
Standard Wald 95% Confidence Wald
Parameter DF Estimate Error Limits Chi-Square Pr > ChiSq
Intercept 1 2.6092 0.1993 2.2185 2.9999 171.31 <.0001
surgery exprmntl 1 -0.3756 0.1131 -0.5973 -0.1538 11.02 0.0009
surgery standard 0 0.0000 0.0000 0.0000 0.0000 . .
sex female 1 0.0483 0.1096 -0.1666 0.2632 0.19 0.6595
sex male 0 0.0000 0.0000 0.0000 0.0000 . .
age 1 -0.0150 0.0040 -0.0230 -0.0071 13.80 0.0002
Scale 0 1.0000 0.0000 1.0000 1.0000
LR Statistics For Type 3 Analysis
Chi-
Source DF Square Pr > ChiSq
surgery 1 11.27 0.0008
sex 1 0.19 0.6594
age 1 13.94 0.0002
surgery Least Squares Means
Standard
surgery Estimate Error z Value Pr > |z| Exponentiated
exprmntl 1.5866 0.08839 17.95 <.0001 4.8872
standard 1.9622 0.06887 28.49 <.0001 7.1151
sex Least Squares Means
Standard
sex Estimate Error z Value Pr > |z| Exponentiated
female 1.7986 0.07807 23.04 <.0001 6.0410
male 1.7503 0.07792 22.46 <.0001 5.7561

These solutions may offer step-by-step problem-solving explanations or good writing examples that include modern styles of formatting and construction
of bibliographies out of text citations and references. Students may use these solutions for personal skill-building and practice.
Unethical use is strictly forbidden.

1. i)

In the first model, the point estimate for exposure is 2.944, which means that a one unit change in exposure increases the odds of having a lung cancer by a factor of 2.944. Thus we will choose a dependent variable whose 95% CI does not include 1 since if this is so, then the interval contains the possibility that the variable did not affect the odds ratio (since 1 is included). Hence the variables that are deemed significant are exposure, sex and age while college and work are not.

1. ii)

The difference between the -2LogL under Intercept and Covariates of Model 1 and 2 is |876.673 – 880.641| = 3.968. This is the test-statistic and we will compare this to x²₂,₀.₉₅=0.103. Since our test statistic is greater than this, then we reject the null hypothesis and conclude that model 2 is more significant in modelling the odds ratio of having a lung cancer....