## Transcribed Text

Use α = .05 for all inference on this assignment.
1. We want to model total cholesterol from the average number of steps per day (given in
thousands). Two models have been run – a simple linear regression with just steps, and a second-order (quadratic) polynomial regression with steps and (steps)2. Use the plot of the data and the results of the two models to answer the following questions.
i) For the simple linear regression model, give the observation number of the point that looks most like an outlier, as well as the value of the statistic you used to make this decision.
ii) For the quadratic model, give the observation number of the point that looks most like an outlier, as well as the value of the statistic you used to make this decision.
iii) Explain why the point selected in part (ii) is not the same as in part (i). It may be helpful to refer to the plot below.
iv) Give the p-value from an appropriate hypothesis test (you’ll have to decide which test) that indicates that the quadratic model is better than the simple linear regression model.
v) Give another reason – one not based on a hypothesis test – that the quadratic model is better than the simple linear regression model. Do not use the R2 as a reason.
vi) Using the quadratic model, predict mean cholesterol for all people who average 9000 steps.
2. We want to model HDL cholesterol in women, using four independent variables: amount of daily exercise (categorized as low or high), race (Black, Hispanic, or White), age, and systolic blood pressure. We have data from 436 women.
Give the appropriate p-values, and interpret the results of the tests for exercise, race, age, and systolic blood pressure. For exercise, also use the least squares means to obtain more information. For race, also interpret the multiple comparison procedure based on the least squares means. For age and systolic blood pressure, also give the estimated slope, and interpret what additional information you obtain from these.
3. We want to model the time (minutes) for complete knee surgery, based on the sex and age of the patient. We have data from 48 women and 43 men, with ages between 40 and 80. We obtain a plot (below), then run a general linear model with sex, age, and their interaction.
i) Explain what an interaction between sex and age means in this case. Your explanation should include the word ‘slope’.
ii) Give the p-value for the interaction term, and briefly interpret the result.
iii) Using the printout, write prediction equations relating minutes to age, separately for females and males.
iv) Least squares means were obtained to compare the sexes at age 45, and again at age 55. Interpret the tests for these two ages.
v) Suppose age had been dichotomized into ‘young’ and ‘old’. Explain what an interaction between sex and age means with age defined this way. Your explanation should now not include the word ‘slope’.
Printout for problem #1:
SLR Model:
Source
Model
Error Corrected Total
20
1224.82365 61.24118
The REG Procedure Dependent Variable: chol
Analysis of Variance
Sum of Mean
DF Squares Square F Value Pr > F
1 1782.26726 1782.26726 29.10 <.0001
Root MSE
Dependent Mean
Coeff Var 3.88108
21
3007.09091
7.82567 R-Square 0.5927 201.63636 Adj R-Sq 0.5723
Parameter Estimates Parameter Standard
Variable Intercept steps
Dependent Predicted Mean Std Error Student
Obs Variable Value Predict Residual Residual Residual -2-1 0 1 2 Cook's D
1 219 214.9260 2.9753 4.0740 7.238 0.563 | |* | 0.027
2 226 213.5661 2.7702 12.4339 7.319 1.699 | |*** | 0.207 3 218 212.7502 2.6510 5.2498 7.363 0.713 | |* | 0.033
4 212 212.4782 2.6120 -0.4782 7.377 -0.065 | | | 0.000
5 218 211.9343 2.5353 6.0657 7.404 0.819 | |* | 0.039
6 202 210.0305 2.2814 -8.0305 7.486 -1.073 | **| | 0.053
7 207 208.6706 2.1175 -1.6706 7.534 -0.222 | | | 0.002
8 187 206.2228 1.8726 -19.2228 7.598 -2.530 | *****| | 0.194 9 206 205.6789 1.8290 0.3211 7.609 0.042 | | | 0.000
10 198 204.5910 1.7560 -6.5910 7.626 -0.864 | *| | 0.020 11 203 202.9591 1.6864 0.0409 7.642 0.005 | | | 0.000 12 194 201.3273 1.6694 -7.3273 7.646 -0.958 | *| | 0.022 13 215 200.7834 1.6759 14.2166 7.644 1.860 | |*** | 0.083 14 190 198.0636 1.7951 -8.0636 7.617 -1.059 | **| | 0.031 15 201 197.2477 1.8562 3.7523 7.602 0.494 | | | 0.007 16 193 195.8878 1.9797 -2.8878 7.571 -0.381 | | | 0.005 17 193 193.1681 2.2908 -0.1681 7.483 -0.022 | | | 0.000 18 188 192.6241 2.3610 -4.6241 7.461 -0.620 | *| | 0.019 19 194 190.7203 2.6226 3.2797 7.373 0.445 | | | 0.013 20 189 189.3605 2.8217 -0.3605 7.299 -0.049 | | | 0.000 21 186 187.4567 3.1133 -1.4567 7.180 -0.203 | | | 0.004 22 197 185.5528 3.4165 11.4472 7.041 1.626 | |*** | 0.311
DF
1 1
Std Error
Estimate 224.17308
-2.71974
Error t Value Pr > |t| 4.49844 49.83 <.0001
0.50415 -5.39 <.0001
Output Statistics
Sum of Residuals
Sum of Squared Residuals Predicted Residual SS (PRESS)
0 1224.82365
1471.92339
Quadratic Model:
The REG Procedure Dependent Variable: chol
Source
Model
Error Corrected Total
Analysis of Variance
Sum of Mean
DF Squares Square F Value Pr > F
2 2023.99208 1011.99604 19.56 <.0001
Variable Intercept steps steps2
Parameter Estimates
Parameter Standard Variance
DF
1 1
1
Estimate 246.06259
-8.71661 0.34921
Error t Value Pr > |t| Inflation 10.93897 22.49 <.0001 0
19
983.09883 51.74204
Root MSE
Dependent Mean
Coeff Var 3.56741
21
3007.09091
Output Statistics
Std Error
7.19319 R-Square 0.6731 201.63636 Adj R-Sq 0.6387
Dependent Predicted Mean
Obs Variable Value Predict Residual Residual Residual
2.81294 -3.10 0.0059 0.16157 2.16 0.0436
36.84663 36.84663
-2-1 0 1 2
1 219 220.4630 3.7472 -1.4630 6.140 -0.238 | | | 0.007
2 226 217.3793 3.0978 8.6207 6.492 1.328 | |** | 0.134
3 218 215.6129 2.7734 2.3871 6.637 0.360 | | | 0.008
4 212 215.0381 2.6771 -3.0381 6.676 -0.455 | | | 0.011
5 218 213.9094 2.5031 4.0906 6.744 0.607 | |* | 0.017
6 202 210.1789 2.0981 -8.1789 6.880 -1.189 | **| | 0.044
7 207 207.7238 1.9951 -0.7238 6.911 -0.105 | | | 0.000
8 187 203.7446 2.0681 -16.7446 6.889 -2.430 | ****| | 0.177 9 206 202.9372 2.1060 3.0628 6.878 0.445 | | | 0.006
10 198 201.4061 2.1855 -3.4061 6.853 -0.497 | | | 0.008
11 203 199.3190 2.2889 3.6810 6.819 0.540 | |* | 0.011
12 194 197.4834 2.3489 -3.4834 6.799 -0.512 | *| | 0.010 13 215 196.9274 2.3570 18.0726 6.796 2.659 | |***** | 0.284 14 190 194.5664 2.3109 -4.5664 6.812 -0.670 | *| | 0.017 15 201 193.9944 2.2752 7.0056 6.824 1.027 | |** | 0.039 16 193 193.1805 2.2091 -0.1805 6.846 -0.026 | | | 0.000
17 193 192.0767 2.1654 0.9233 6.860 0.135 | | | 0.001 18 188 191.9398 2.1932 -3.9398 6.851 -0.575 | *| | 0.011 19 194 191.6805 2.4513 2.3195 6.763 0.343 | | | 0.005 20 189 191.7048 2.8113 -2.7048 6.621 -0.409 | | | 0.010 21 186 192.0321 3.5595 -6.0321 6.251 -0.965 | *| | 0.101 22 197 192.7017 4.5608 4.2983 5.562 0.773 | |* | 0.134
Sum of Residuals
Sum of Squared Residuals Predicted Residual SS (PRESS)
0 983.09883
1277.94678
Std Error Student
Cook's D
Printout for problem #2:
The GLM Procedure Class Level Information
race race race age sysbp
Black Hispanic
White
4.15099031 13.07 0.72636293 2.36
. . . 0.75837081 -5.25
0.89768458 -1.27
<.0001 0.0188
<.0001 0.2031
Class exercise race
Levels Values
Source DF Model 5 Error 430 Corrected Total
R-Square 0.081748
Sum of Squares
1840.80619 20677.38417
Mean Square 368.16124 48.08694
F Value 7.66
Pr > F <.0001
Pr > F 0.0249 <.0001
Source exercise race age sysbp
Source exercise race age sysbp
Parameter
Intercept exercise high exercise low
DF
1 2
1 1
DF
1 2
1 1
3
2 high low
Black Hispanic White
Number of Observations Read 436 Number of Observations Used 436
Dependent Variable: HDL
435 22518.19037
Coeff Var Root MSE HDL Mean
13.98636
6.934475 49.58028
Type I SS 243.555840
1276.816928 316.348242 4.085182
Type III SS
267.667128 1337.650338
318.962151 4.085182
Standard Estimate
54.24977715 B 1.71371113 B 0.00000000B
-3.98306885 B -1.14441447 B 0.00000000 B
Mean Square 243.555840
638.408464 316.348242
4.085182
Mean Square 267.667128
668.825169 318.962151
4.085182
Error t Value
. . -0.04594566 0.01783974
.
-2.58 0.0103
-0.29 0.7708
-0.00952932 0.03269414
F Value 5.06 13.28
6.58
0.08 0.7708
F Value 5.57
13.91 6.63
Pr > F 0.0188 <.0001
0.0103 0.08 0.7708
Pr > |t|
0.0107
race
Black Hispanic White
LSMEAN HDL LSMEAN
47.4111307 50.2497851
51.3941995
Number
1 2
3
Least Squares Means
exercise
high low
HDL LSMEAN
50.5418940 48.8281829
Adjustment for Multiple Comparisons: Tukey-Kramer
Least Squares Means for effect race Pr > |t| for H0: LSMean(i)=LSMean(j)
Dependent Variable: HDL i/j 1 2 3
1 2 3
0.0097 <.0001
0.0097 <.0001
0.4102 0.4102
Printout for problem #3:
The GLM Procedure
Class Level Information
Class Levels Values sex 2 FM
Dependent Variable: minutes
Source DF Model 3 Error 87 Corrected Total
R-Square 0.774650
Sum of Squares
25677.72489
Source sex
age age*sex
Source sex
age age*sex
Parameter Intercept
Pr > F <.0001
sex
sex
age age*sex age*sex
7.35629950 3.50 10.35473618 3.51
. . . 0.11626600 13.46
0.0007 0.0007
<.0001 <.0001
F M
Mean Square 8559.24163 85.85993
Coeff Var Root MSE minutes Mean 8.155641 9.266063 113.6154
DF Type I SS Mean Square F Value 7002.80736 7002.80736 81.56
Pr > F <.0001
7469.81357
90 33147.53846
F Value 99.69
1
1 16623.26350 16623.26350 193.61 <.0001
1 DF
2051.65403
Standard Estimate
2051.65403
Mean Square
23.90 <.0001
F Value Pr > F
Type III SS 1055.90546
1
1 15984.50009 15984.50009 186.17 <.0001
1055.90546
1 2051.65403 2051.65403 23.90 <.0001
25.74560633 B 36.31251270 B 0.00000000 B
1.56439660 B -0.82526870 B
Error t Value Pr > |t|
F
M 0.00000000 B . . .
Least Squares Means at age=45 H0:LSMean1=
sex F M
minutes LSMEAN
LSMean2 Pr > |t|
95.3188748 96.1434534
0.7995
Least Squares Means at age=55 H0:LSMean1=
sex F M
minutes LSMEAN
LSMean2 Pr > |t|
102.710154 111.787419
<.0001
0.16882570 -4.89
12.30 0.0007

These solutions may offer step-by-step problem-solving explanations or good writing examples that include modern styles of formatting and construction
of bibliographies out of text citations and references. Students may use these solutions for personal skill-building and practice.
Unethical use is strictly forbidden.

i) For the simple linear regression model, give the observation number of the point that looks most like an outlier, as well as the value of the statistic you used to make this decision.

Observation 8 since it has the statistic (studentized residual) of -2.530 whose absolute value is closest to 3.

ii) For the quadratic model, give the observation number of the point that looks most like an outlier, as well as the value of the statistic you used to make this decision.

Observation 13 since it has the statistic (studentized residual) of 2.659 whose absolute value is closest to 3.

iii) Explain why the point selected in part (ii) is not the same as in part (i). It may be helpful to refer to the plot below.

Since the first point is more in line with a quadratic line while the second is more attuned to the straight line....