Transcribed Text
Below are descriptions for the first six problems of the exam. These problems will make up approximately 60 – 80% of the exam. Use α = .05 for all inference on these problems.
The exam is open book, open notes, etc. Make sure you bring this information sheet with you, along with any work you have done ahead of time on these seven problems. Also make sure you bring the statistical tables from the notes.
You may use a smart phone to access the calculator, and you may use a computer to access course notes and tables. However, no internet access (including Moodle) is allowed during the exam (so, download the tables, notes, etc. that you might want to use prior to the exam).
The remainder of the exam will be typical inclass questions where you may see printout, have to do some short calculations, interpret results, select the best analysis approach, answer multiple choice questions, etc.
Please work alone on these problems! Any questions should be directed to the class instructor or teaching assistant – no one else!
Problem #1:
We want to model grip strength in people age 50 and older. Possible independent variables are sex, selfreported general health (good, fair, or poor), age, and systolic blood pressure.
A plot of grip strength vs. age is given, where the plotting symbol represents the sex of the person. This is followed by two analyses. The first is a simple linear regression that models grip strength from age; the second is a general linear model (ANACOVA) that uses all four independent variables.
Be prepared to report and interpret results for each analysis, and to explain why the results of the two may differ.
Problem #2:
We want to test if median grip strength differs from 15 for men 60 and older. We measure this in a sample of n=15 such men; the results are below.
17.65 18.20 8.55 17.95 16.40 16.00 13.25 15.85 14.80 16.75 16.45 17.35 17.05 16.80 18.05
Be prepared to provide results of two relevant nonparametric tests that the median is 15. Also be prepared to demonstrate how the results for the simpler of the two tests are obtained.
Problem #3:
We want to model the probability of full recovery after a woman has shoulder surgery. The main independent variable is whether or not she attended physical therapy (PT). We also record the age of the woman, and whether or not she has gone through menopause. For the categorical variables, referent levels are no PT and not having gone through menopause.
Three logistic regression models have been run. The first contains all three variables, the second contains PT and age, and the third contains only PT. Be prepared to compare any two models (including a formal test), and to interpret the results of any model.
Problem #4:
We want to compare two intervention programs (one based on diet, the other based on exercise) and a control group, in how well they reduce cholesterol. Twelve people are randomly assigned to each of the three groups. Each person has cholesterol measured before and after the program.
In addition to plots, cholesterol is modeled as a function of group, time, and their interaction, where time is treated as a categorical variable. The analysis used a compound symmetric covariance structure. Solutions for fixed effects, least square means (“sliced” by time), and other estimates have been requested. Be prepared to interpret all results of this model.
Problem #5:
We want to compare two screening tests (labeled as Test A and Test B) for diabetes, given to the same 180 people. You will be asked to calculate sensitivity, specificity, and PPV for both tests (assume 18% of the population of interest has diabetes).
We also want to compare the two tests, based on their performance when looking at the 80 people with diabetes (see the last table). Be prepared to present a measure of agreement for the tests, and to test if there is a difference in the proportion who have a positive screen.
For this problem, you would be wise to do some work before coming to the exam.
Problem #6:
We want to model survival (months) after a diagnosis of bladder cancer. The main variable of interest is treatment type (Standard or Standard+Drug). We also want to control for sex, race (Black, Hispanic, White), and age at diagnosis.
Prior to doing the survival analysis, we perform a twosample ttest to see if the mean age is the same for the two treatment types. We also used contingency tables to look at the association between treatment and sex, as well as treatment and race. For the survival analysis, KaplanMeier curves and associated tests are presented for treatment type. Next, a Cox model containing all three independent variables is used. Be prepared to interpret results of all tests.
Edited Printout for Problem #1:
Source
Model 1 Error 125 Corrected Total
Sum of
Mean Square
11.40676 3.97314
Variable Intercept age
Parameter Estimates Parameter Standard
1
DF 1
Estimate 15.52869
0.02751
Error 1.13876
0.01623
t Value 13.64
1.69
Pr > t <.0001
0.0927
The REG Procedure Dependent Variable: grip
Analysis of Variance
DF
Squares 11.40676 496.64268
F Value 2.87
Pr > F 0.0927
Root MSE
Dependent Mean
Coeff Var 14.63211
0.0225 0.0146
126
508.04944
1.99327 RSquare 13.62260 Adj RSq
Source DF Model 5 Error 121 Corrected Total
RSquare 0.491342
Sum of Squares
249.6260067
Mean Square 49.9252013
F Value 23.38
Source sex health age systolic
Source sex health age systolic
Parameter Intercept
sex Female sex Male
DF Type I SS Mean Square 1 139.7672702 139.7672702 2 97.9862673 48.9931336
Pr > F
health health health age systolic
fair good poor
16.99140284 B 2.03763735 B
0.00000000 B 0.13023072 B
1.83954225 B 0.00000000 B
9.72
7.34
. 0.28
4.09
<.0001 <.0001
0.7809 <.0001
health fair good poor
LSMEAN grip LSMEAN
13.0522024 14.7615139 12.9219717
Number
1 2
3
The GLM Procedure
1 1
10.1352510 1.6049797
Standard Estimate
10.1352510 1.6049797
Error t Value 1.74779751
0.27750910 . .
0.46714795 0.44924159
258.4234359
126 508.0494425
Pr > F <.0001
F Value
65.44 <.0001 22.94 <.0001
4.81 0.0302 0.75 0.3877
FValue Pr>F 53.91 <.0001 21.91 <.0001
4.75 0.0313 0.75 0.3877
Pr > t
Coeff Var Root MSE grip Mean 10.72787 1.461414 13.62260
1 1
DF
1 115.1452037 115.1452037
10.2674896 1.6049797
10.2674896 1.6049797
Type III SS
2 93.5687313 46.7843657
. . 0.02630992 0.01207746
. 2.18
0.87
0.0313 0.3877
0.00991040 0.01143220
Least Squares Means
sex Female Male
grip LSMEAN 12.5597440
14.5973814
Least Squares Means for effect health Pr > t for H0: LSMean(i)=LSMean(j)
Dependent Variable: grip i/j 1 2 3
1 2 3
<.0001 0.9581
<.0001 0.9581
0.0002 0.0002
2.1357309
Mean Square
Edited Printout for Problem #2:
The UNIVARIATE Procedure Variable: grip
Moments
N
Mean
Std Deviation Skewness Uncorrected SS Coeff Variation
15 SumWeights 15 16.0733333 Sum Observations 241.1
2.45683207 Variance 2.3272315 Kurtosis
3959.785 Corrected SS
15.2851435 Std Error Mean 0.63435131
Basic Statistical Measures
Location Variability
Mean 16.07333 Std Deviation 2.45683
Median 16.75000 Variance Mode . Range
Interquartile Range
6.03602 9.65000
1.80000
Tests for Location: Mu0=15 Test Statistic p Value
Student's t Sign M Signed Rank
t 1.692017 Pr > t 0.1128 4.5 Pr >= M 0.0352
S 37.5 Pr >= S 0.0313
Quantiles (Definition 5) Level Quantile
100% Max 99%
95%
90%
18.20 18.20
18.20 18.05
75%
50%
25%
10%
5%
1%
0% Min 8.55
Extreme Observations
Lowest
Value Obs 8.55 3
13.25 7 14.80 9 15.85 8 16.00 6
Highest
Value Obs 17.35 12
17.65 1 17.95 4 18.05 15 18.20 2
Q3 Median
17.65 16.75
Q1
15.85 13.25
Stem Leaf
18 002
16 04488046
14 88
122 1  10
861* ++++
Edited Printout for Problem #3:
Model with PT, age, and menopause:
8.55 8.55
3 2
#
Boxplot
8
 ++
+++
6.03602381 6.26915894
84.5043333
The LOGISTIC Procedure
Model Information
Data Set WORK.THREE Response Variable recovery
Number of Response Levels 2
Model binary logit Optimization Technique Fisher's scoring
Response Profile
Ordered Total Value recovery Frequency
1 Full 369
2 NotFull 142
Probability modeled is recovery='Full'.
Model Fit Statistics Intercept
Parameter Intercept
PT Yes
age menopause Yes
Standard Estimate
1.8836 1.1751
Wald
Error ChiSquare
0.6908 7.4345
Pr > ChiSq 0.0064
Criterion AIC
SC
2 Log L
Only 605.947 610.183
1 1
0.2276 0.0146
26.6563 2.1754
<.0001 0.1402
DF 1
Intercept
and Covariates
572.459 589.405
603.947
Type 3 Analysis of Effects
Wald ChiSquare 26.6563
Effect DF PT 1 age 1 menopause
Analysis of Maximum Likelihood Estimates
564.459
2.1754
1 2.8356 0.0922
Pr > ChiSq <.0001 0.1402
0.0216
1 0.4255 0.2527 2.8356 0.0922
Odds Ratio Estimates
Point 95% Wald
Estimate Confidence Limits
Effect
PT
age
menopause Yes vs No 0.653 0.398
Yes vs No 3.238 2.073 5.059
Model with PT and age:
The LOGISTIC Procedure Model Information
0.979 0.951 1.007
1.072
Data Set WORK.THREE Response Variable recovery
Number of Response Levels 2
Model binary logit Optimization Technique Fisher's scoring
Response Profile
Ordered Total Value recovery Frequency
Parameter Intercept
DF 1
Standard Estimate
Wald
Error ChiSquare
Pr > ChiSq 0.0002
PT age
Yes
1 1
2.3669 1.1569
0.0356
0.6301 14.1083
1 Full 369
2 NotFull 142
Probability modeled is recovery='Full'.
Model Fit Statistics Intercept
Criterion AIC
SC
2 Log L
Only 605.947 610.183
Effect DF PT 1 age 1
Pr > ChiSq <.0001 0.0033
Intercept
and Covariates
573.315 586.024
603.947
Type 3 Analysis of Effects
Wald ChiSquare 26.0415
8.6476
Analysis of Maximum Likelihood Estimates
Odds Ratio Estimates
0.2267 0.0121
26.0415 8.6476
<.0001 0.0033
Point 95% Wald
Effect Estimate Confidence Limits PT Yes vs No 3.180 2.039 4.959
age 0.965 0.942 0.988
567.315
Model with PT:
The LOGISTIC Procedure Model Information
Data Set WORK.THREE Response Variable recovery
Number of Response Levels 2
Model binary logit Optimization Technique Fisher's scoring
Response Profile
Ordered Total Value recovery Frequency
1 Full 369
2 NotFull 142
Probability modeled is recovery='Full'.
Model Fit Statistics Intercept
Parameter Intercept PT Yes
DF 1
Wald Error
Criterion AIC
SC
2 Log L
Only 605.947 610.183
603.947
1
0.1200
ChiSquare
21.8348 25.1168
Pr > ChiSq <.0001
<.0001
Intercept
and Covariates
580.187 588.660
576.187
Type 3 Analysis of Effects
Wald
Effect DF ChiSquare Pr > ChiSq
PT 1 25.1168 <.0001
Analysis of Maximum Likelihood Estimates
Standard Estimate
0.5609
1.1243 0.2243
Odds Ratio Estimates
Point 95% Wald
Estimate Confidence Limits
Effect
PT Yes vs No 3.078 1.983 4.778
Edited Printout for Problem #4:
time*group time*group time*group time*group time*group time*group
Before
Before
Before
zAfter Control 0 . . . . zAfter Diet 0.... zAfter Exercise 0....
Type 3 Tests of Fixed Effects
Effect time*group time*group time*group time*group time*group time*group
t Value 41.2
Pr > t
67.93 <.0001
The Mixed Procedure Covariance Parameter Estimates
Cov Parm CS id Residual
Effect
Intercept
group
group
group Exercise0.... time Before 22.2500 2.5006 33 time zAfter 0 ....
time group
Standard
Estimate Error DF
t Value 61.39
4.58 1.66
8.90
Pr > t <.0001
<.0001 0.1047
<.0001
Control Diet
228.92 3.7291 41.2 24.1667 5.2738 41.2
Subject Estimate 129.36
37.5177 Solution for Fixed Effects
8.7500 5.2738 41.2
Control Diet Exercise
22.0000 3.5364 7.4167 3.5364
0 . .
33 33
6.22 2.10
.
<.0001 0.0437
Num Den
DF DF F Value
2 33 3.58
1
Least Squares Means
Standard
time group Estimate Error DF
Before Control 253.33 3.7291
Before Diet 252.50 3.7291 41.2 67.71 <.0001 Before Exercise 251.17 3.7291 41.2 67.35 <.0001 zAfter Control 253.08 3.7291 41.2 67.87 <.0001 zAfter Diet 237.67 3.7291 41.2 63.73 <.0001 zAfter Exercise 228.92 3.7291 41.2 61.39 <.0001
Effect group
time time*group
Pr > F 0.0393 <.0001
Label
Control vs. Diet After Program Control vs. Exercise After Program Diet vs. Exercise After Program
15.4167 24.1667
8.7500
2.92
4.58 <.0001
33 74.30
2 33 20.04 <.0001
Tests of Effect Slices
Num Den
Effect time DF DF F Value Pr > F
time*group Before 2 41.2 0.09 0.9179 time*group zAfter 2 41.2 10.77 0.0002
DF t Value 5.2738 41.2
Estimates
Estimate Std Error
Pr > t
5.2738 41.2 5.2738 41.2
0.0056 1.66 0.1047
.
Tables Problem #5:
Test A
Diabetes No Diabetes +76 22
4 78 80 100
Test B
Diabetes No Diabetes +68 7
12 93 80 100
Test A
Test B +
+ 66 10 76

224 68 12 80
Edited Printout for Problem #6:
treatment
Standard Std+Drug Diff (12)
treatment
N Mean 129 56.6977
Std Dev
14.5777 15.0924
Std Err
1.2835 1.6370
Minimum
31.0000 30.0000
Std Dev
Maximum
80.0000 79.0000
95% CL Std Dev
Standard Std+Drug
Diff (12) Pooled
14.5777 12.9897 16.6115
Diff (12)
15.0924 13.1149 17.7778 1.4036 2.6676 5.4748 14.7838 13.5005 16.3387
85 Method
55.2941
1.4036 14.7838 2.0653
The TTEST Procedure Variable: age
Mean 95% CL Mean
56.6977 54.1581 59.2373 55.2941 52.0388 58.5495
Satterthwaite 1.4036 2.7018 5.5089
Method Variances
Pooled Equal Satterthwaite Unequal
DF t Value Pr > t
212 0.68 0.4975 175.5 0.67 0.5007
Equality of Variances
Method NumDF DenDF FValue Pr>F Folded F 84 128 1.07 0.7165
The FREQ Procedure Table of treatment by sex
treatment sex
Frequency‚
Percent ‚
Row Pct ‚
Col Pct ‚Female ‚Male ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Standard ‚ 76 ‚ 53 ‚
‚ Total
129 ‚ 35.51 ‚ 24.77 ‚ 60.28
‚ 58.91 ‚ 41.09 ‚
‚ 76.77 ‚ 46.09 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Std+Drug ‚ 23 ‚ 62 ‚ 85
‚ 10.75 ‚ 28.97 ‚ 39.72 ‚ 27.06 ‚ 72.94 ‚
‚ 23.23 ‚ 53.91 ‚
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 99 115 214
46.26 53.74 100.00
Statistics for Table of treatment by sex
Statistic DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ ChiSquare 1 20.9155 <.0001
Likelihood Ratio ChiSquare Continuity Adj. ChiSquare MantelHaenszel ChiSquare
Phi Coefficient Contingency Coefficient Cramer's V
1 21.5071 1 19.6538
1 20.8178 0.3126
0.2984 0.3126
<.0001 <.0001
<.0001
Fisher's Exact Test ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Cell (1,1) Frequency (F) 76
Leftsided Pr <= F Rightsided Pr >= F
Table Probability (P) Twosided Pr <= P
Sample Size = 214
1.0000 <.0001
<.0001 <.0001
The FREQ Procedure Table of treatment by race
treatment race
Frequency‚
Percent ‚
Row Pct ‚
Col Pct ‚Black ‚Hispanic‚White ‚ Total ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Standard‚ 38‚ 43‚ 48‚ 129 ‚ 17.76 ‚ 20.09 ‚ 22.43 ‚ 60.28 ‚ 29.46 ‚ 33.33 ‚ 37.21 ‚
‚ 64.41 ‚ 63.24 ‚ 55.17 ‚
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Std+Drug‚ 21‚ 25‚ 39‚ 85 ‚ 9.81 ‚ 11.68 ‚ 18.22 ‚ 39.72
‚ 24.71 ‚ 29.41 ‚ 45.88 ‚ ‚ 35.59 ‚ 36.76 ‚ 44.83 ‚
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 59 68 87
214 27.57 31.78 40.65 100.00
Statistics for Table of treatment by race
Statistic DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ ChiSquare 2 1.6156 0.4458 Likelihood Ratio ChiSquare 2 1.6115 MantelHaenszel ChiSquare 1 1.3818
0.0869 0.0866
0.4467 0.2398
Phi Coefficient Contingency Coefficient Cramer's V
0.0869 Sample Size = 214
The LIFETEST Procedure Stratum 1: treatment = Standard
Quartile Estimates
Point 95% Confidence Interval
Percent Estimate Transform [Lower Upper)
75 27.0000 LOGLOG 26.0000 50 22.0000 LOGLOG 19.0000 25 17.0000 LOGLOG 14.0000
Stratum 2: treatment = Std+Drug Quartile Estimates
31.0000 25.0000 18.0000
Point 95% Confidence Interval
Percent Estimate Transform [Lower Upper)
75 29.0000 50 23.0000 25 18.0000
LOGLOG LOGLOG LOGLOG
26.0000 34.0000 20.0000 26.0000 15.0000 20.0000
Test of Equality over Strata
Pr > Test ChiSquare DF
ChiSquare 0.3108 0.4411
0.4603
LogRank Wilcoxon 2Log(LR)
1.0272 1 0.5933 1 0.5451 1
Parameter
Analysis of Maximum Likelihood Estimates
Parameter Standard Hazard 95% Hazard Ratio
DF Estimate Error ChiSquare Pr > ChiSq Ratio Confidence Limits
The PHREG Procedure
Summary of the Number of Event and Censored Values
Percent
Total Event Censored Censored
214 148 66 30.84 Model Fit Statistics
Score Wald
Effect
132.1085 5 117.2857 5
Criterion 2 LOG L AIC
SBC
Without Covariates
941.371 941.371
With Covariates
811.077 821.077
941.371
Testing Global Null Hypothesis: BETA=0
Test
Likelihood Ratio 130.2936
Pr > ChiSq
treatment sex 1 race 2 age 1
1
10.7456 43.8598
0.2958 105.3042
ChiSquare DF
5
<.0001 <.0001
<.0001
Type 3 Tests
DF
Wald ChiSquare
Pr > ChiSq
0.0010 <.0001
0.8625 <.0001
836.063
treatment Standard 1 0.63270 0.19301 10.7456 0.0010 1.883 1.290 2.748 sex Female 1 1.40373 0.21196 43.8598 <.0001 0.246 0.162 0.372 race Black 1 0.00433 0.21872 0.0004 0.9842 1.004 0.654 1.542
race Hispanic 1 0.10083 0.20063 0.2526 0.6153 1.106 0.746 1.639
age 1 0.07546 0.00735 105.3042 <.0001 1.078 1.063 1.094
These solutions may offer stepbystep problemsolving explanations or good writing examples that include modern styles of formatting and construction
of bibliographies out of text citations and references. Students may use these solutions for personal skillbuilding and practice.
Unethical use is strictly forbidden.
Problem #1:
We want to model grip strength in people age 50 and older. Possible independent variables are sex, selfreported general health (good, fair, or poor), age, and systolic blood pressure.
A plot of grip strength vs. age is given, where the plotting symbol represents the sex of the person. This is followed by two analyses. The first is a simple linear regression that models grip strength from age; the second is a general linear model (ANACOVA) that uses all four independent variables.
Be prepared to report and interpret results for each analysis, and to explain why the results of the two may differ.
When we look at the regression between age and grip strength, we see that the Rsquared is very low at 0.0225, this means that only 2.25% of the variability in the grip strength can only be explained by age. This, coupled with a high pvalue (we want a low pvalue to make the model appealing) of 0.0927, suggest that the regression model relating age to grip strength is not strong.
For the GLM procedure, the Rsquared for the full model (including health, sex and systolic) improved to 0.4913 as well as the pvalue for the Ftest of the full model. However, only sex and health have significant coefficients (low pvalue). This means that a linear regression of sex and health are more appropriate....