QuestionQuestion

Transcribed TextTranscribed Text

Use α = .05 for all inference on this assignment. 1. We want to model total cholesterol from the average number of steps per day (given in thousands). Two models have been run – a simple linear regression with just steps, and a second-order (quadratic) polynomial regression with steps and (steps)2. Use the plot of the data and the results of the two models to answer the following questions. i) For the simple linear regression model, give the observation number of the point that looks most like an outlier, as well as the value of the statistic you used to make this decision. ii) For the quadratic model, give the observation number of the point that looks most like an outlier, as well as the value of the statistic you used to make this decision. iii) Explain why the point selected in part (ii) is not the same as in part (i). It may be helpful to refer to the plot below. iv) Give the p-value from an appropriate hypothesis test (you’ll have to decide which test) that indicates that the quadratic model is better than the simple linear regression model. v) Give another reason – one not based on a hypothesis test – that the quadratic model is better than the simple linear regression model. Do not use the R2 as a reason. vi) Using the quadratic model, predict mean cholesterol for all people who average 9000 steps. 2. We want to model HDL cholesterol in women, using four independent variables: amount of daily exercise (categorized as low or high), race (Black, Hispanic, or White), age, and systolic blood pressure. We have data from 436 women. Give the appropriate p-values, and interpret the results of the tests for exercise, race, age, and systolic blood pressure. For exercise, also use the least squares means to obtain more information. For race, also interpret the multiple comparison procedure based on the least squares means. For age and systolic blood pressure, also give the estimated slope, and interpret what additional information you obtain from these. 3. We want to model the time (minutes) for complete knee surgery, based on the sex and age of the patient. We have data from 48 women and 43 men, with ages between 40 and 80. We obtain a plot (below), then run a general linear model with sex, age, and their interaction. i) Explain what an interaction between sex and age means in this case. Your explanation should include the word ‘slope’. ii) Give the p-value for the interaction term, and briefly interpret the result. iii) Using the printout, write prediction equations relating minutes to age, separately for females and males. iv) Least squares means were obtained to compare the sexes at age 45, and again at age 55. Interpret the tests for these two ages. v) Suppose age had been dichotomized into ‘young’ and ‘old’. Explain what an interaction between sex and age means with age defined this way. Your explanation should now not include the word ‘slope’. Printout for problem #1: SLR Model: Source Model Error Corrected Total 20 1224.82365 61.24118 The REG Procedure Dependent Variable: chol Analysis of Variance Sum of Mean DF Squares Square F Value Pr > F 1 1782.26726 1782.26726 29.10 <.0001 Root MSE Dependent Mean Coeff Var 3.88108 21 3007.09091 7.82567 R-Square 0.5927 201.63636 Adj R-Sq 0.5723 Parameter Estimates Parameter Standard Variable Intercept steps Dependent Predicted Mean Std Error Student Obs Variable Value Predict Residual Residual Residual -2-1 0 1 2 Cook's D 1 219 214.9260 2.9753 4.0740 7.238 0.563 | |* | 0.027 2 226 213.5661 2.7702 12.4339 7.319 1.699 | |*** | 0.207 3 218 212.7502 2.6510 5.2498 7.363 0.713 | |* | 0.033 4 212 212.4782 2.6120 -0.4782 7.377 -0.065 | | | 0.000 5 218 211.9343 2.5353 6.0657 7.404 0.819 | |* | 0.039 6 202 210.0305 2.2814 -8.0305 7.486 -1.073 | **| | 0.053 7 207 208.6706 2.1175 -1.6706 7.534 -0.222 | | | 0.002 8 187 206.2228 1.8726 -19.2228 7.598 -2.530 | *****| | 0.194 9 206 205.6789 1.8290 0.3211 7.609 0.042 | | | 0.000 10 198 204.5910 1.7560 -6.5910 7.626 -0.864 | *| | 0.020 11 203 202.9591 1.6864 0.0409 7.642 0.005 | | | 0.000 12 194 201.3273 1.6694 -7.3273 7.646 -0.958 | *| | 0.022 13 215 200.7834 1.6759 14.2166 7.644 1.860 | |*** | 0.083 14 190 198.0636 1.7951 -8.0636 7.617 -1.059 | **| | 0.031 15 201 197.2477 1.8562 3.7523 7.602 0.494 | | | 0.007 16 193 195.8878 1.9797 -2.8878 7.571 -0.381 | | | 0.005 17 193 193.1681 2.2908 -0.1681 7.483 -0.022 | | | 0.000 18 188 192.6241 2.3610 -4.6241 7.461 -0.620 | *| | 0.019 19 194 190.7203 2.6226 3.2797 7.373 0.445 | | | 0.013 20 189 189.3605 2.8217 -0.3605 7.299 -0.049 | | | 0.000 21 186 187.4567 3.1133 -1.4567 7.180 -0.203 | | | 0.004 22 197 185.5528 3.4165 11.4472 7.041 1.626 | |*** | 0.311 DF 1 1 Std Error Estimate 224.17308 -2.71974 Error t Value Pr > |t| 4.49844 49.83 <.0001 0.50415 -5.39 <.0001 Output Statistics Sum of Residuals Sum of Squared Residuals Predicted Residual SS (PRESS) 0 1224.82365 1471.92339 Quadratic Model: The REG Procedure Dependent Variable: chol Source Model Error Corrected Total Analysis of Variance Sum of Mean DF Squares Square F Value Pr > F 2 2023.99208 1011.99604 19.56 <.0001 Variable Intercept steps steps2 Parameter Estimates Parameter Standard Variance DF 1 1 1 Estimate 246.06259 -8.71661 0.34921 Error t Value Pr > |t| Inflation 10.93897 22.49 <.0001 0 19 983.09883 51.74204 Root MSE Dependent Mean Coeff Var 3.56741 21 3007.09091 Output Statistics Std Error 7.19319 R-Square 0.6731 201.63636 Adj R-Sq 0.6387 Dependent Predicted Mean Obs Variable Value Predict Residual Residual Residual 2.81294 -3.10 0.0059 0.16157 2.16 0.0436 36.84663 36.84663 -2-1 0 1 2 1 219 220.4630 3.7472 -1.4630 6.140 -0.238 | | | 0.007 2 226 217.3793 3.0978 8.6207 6.492 1.328 | |** | 0.134 3 218 215.6129 2.7734 2.3871 6.637 0.360 | | | 0.008 4 212 215.0381 2.6771 -3.0381 6.676 -0.455 | | | 0.011 5 218 213.9094 2.5031 4.0906 6.744 0.607 | |* | 0.017 6 202 210.1789 2.0981 -8.1789 6.880 -1.189 | **| | 0.044 7 207 207.7238 1.9951 -0.7238 6.911 -0.105 | | | 0.000 8 187 203.7446 2.0681 -16.7446 6.889 -2.430 | ****| | 0.177 9 206 202.9372 2.1060 3.0628 6.878 0.445 | | | 0.006 10 198 201.4061 2.1855 -3.4061 6.853 -0.497 | | | 0.008 11 203 199.3190 2.2889 3.6810 6.819 0.540 | |* | 0.011 12 194 197.4834 2.3489 -3.4834 6.799 -0.512 | *| | 0.010 13 215 196.9274 2.3570 18.0726 6.796 2.659 | |***** | 0.284 14 190 194.5664 2.3109 -4.5664 6.812 -0.670 | *| | 0.017 15 201 193.9944 2.2752 7.0056 6.824 1.027 | |** | 0.039 16 193 193.1805 2.2091 -0.1805 6.846 -0.026 | | | 0.000 17 193 192.0767 2.1654 0.9233 6.860 0.135 | | | 0.001 18 188 191.9398 2.1932 -3.9398 6.851 -0.575 | *| | 0.011 19 194 191.6805 2.4513 2.3195 6.763 0.343 | | | 0.005 20 189 191.7048 2.8113 -2.7048 6.621 -0.409 | | | 0.010 21 186 192.0321 3.5595 -6.0321 6.251 -0.965 | *| | 0.101 22 197 192.7017 4.5608 4.2983 5.562 0.773 | |* | 0.134 Sum of Residuals Sum of Squared Residuals Predicted Residual SS (PRESS) 0 983.09883 1277.94678 Std Error Student Cook's D Printout for problem #2: The GLM Procedure Class Level Information race race race age sysbp Black Hispanic White 4.15099031 13.07 0.72636293 2.36 . . . 0.75837081 -5.25 0.89768458 -1.27 <.0001 0.0188 <.0001 0.2031 Class exercise race Levels Values Source DF Model 5 Error 430 Corrected Total R-Square 0.081748 Sum of Squares 1840.80619 20677.38417 Mean Square 368.16124 48.08694 F Value 7.66 Pr > F <.0001 Pr > F 0.0249 <.0001 Source exercise race age sysbp Source exercise race age sysbp Parameter Intercept exercise high exercise low DF 1 2 1 1 DF 1 2 1 1 3 2 high low Black Hispanic White Number of Observations Read 436 Number of Observations Used 436 Dependent Variable: HDL 435 22518.19037 Coeff Var Root MSE HDL Mean 13.98636 6.934475 49.58028 Type I SS 243.555840 1276.816928 316.348242 4.085182 Type III SS 267.667128 1337.650338 318.962151 4.085182 Standard Estimate 54.24977715 B 1.71371113 B 0.00000000B -3.98306885 B -1.14441447 B 0.00000000 B Mean Square 243.555840 638.408464 316.348242 4.085182 Mean Square 267.667128 668.825169 318.962151 4.085182 Error t Value . . -0.04594566 0.01783974 . -2.58 0.0103 -0.29 0.7708 -0.00952932 0.03269414 F Value 5.06 13.28 6.58 0.08 0.7708 F Value 5.57 13.91 6.63 Pr > F 0.0188 <.0001 0.0103 0.08 0.7708 Pr > |t| 0.0107 race Black Hispanic White LSMEAN HDL LSMEAN 47.4111307 50.2497851 51.3941995 Number 1 2 3 Least Squares Means exercise high low HDL LSMEAN 50.5418940 48.8281829 Adjustment for Multiple Comparisons: Tukey-Kramer Least Squares Means for effect race Pr > |t| for H0: LSMean(i)=LSMean(j) Dependent Variable: HDL i/j 1 2 3 1 2 3 0.0097 <.0001 0.0097 <.0001 0.4102 0.4102 Printout for problem #3: The GLM Procedure Class Level Information Class Levels Values sex 2 FM Dependent Variable: minutes Source DF Model 3 Error 87 Corrected Total R-Square 0.774650 Sum of Squares 25677.72489 Source sex age age*sex Source sex age age*sex Parameter Intercept Pr > F <.0001 sex sex age age*sex age*sex 7.35629950 3.50 10.35473618 3.51 . . . 0.11626600 13.46 0.0007 0.0007 <.0001 <.0001 F M Mean Square 8559.24163 85.85993 Coeff Var Root MSE minutes Mean 8.155641 9.266063 113.6154 DF Type I SS Mean Square F Value 7002.80736 7002.80736 81.56 Pr > F <.0001 7469.81357 90 33147.53846 F Value 99.69 1 1 16623.26350 16623.26350 193.61 <.0001 1 DF 2051.65403 Standard Estimate 2051.65403 Mean Square 23.90 <.0001 F Value Pr > F Type III SS 1055.90546 1 1 15984.50009 15984.50009 186.17 <.0001 1055.90546 1 2051.65403 2051.65403 23.90 <.0001 25.74560633 B 36.31251270 B 0.00000000 B 1.56439660 B -0.82526870 B Error t Value Pr > |t| F M 0.00000000 B . . . Least Squares Means at age=45 H0:LSMean1= sex F M minutes LSMEAN LSMean2 Pr > |t| 95.3188748 96.1434534 0.7995 Least Squares Means at age=55 H0:LSMean1= sex F M minutes LSMEAN LSMean2 Pr > |t| 102.710154 111.787419 <.0001 0.16882570 -4.89 12.30 0.0007

Solution PreviewSolution Preview

These solutions may offer step-by-step problem-solving explanations or good writing examples that include modern styles of formatting and construction of bibliographies out of text citations and references. Students may use these solutions for personal skill-building and practice. Unethical use is strictly forbidden.

i) For the simple linear regression model, give the observation number of the point that looks most like an outlier, as well as the value of the statistic you used to make this decision.

Observation 8 since it has the statistic (studentized residual) of -2.530 whose absolute value is closest to 3.

ii) For the quadratic model, give the observation number of the point that looks most like an outlier, as well as the value of the statistic you used to make this decision.

Observation 13 since it has the statistic (studentized residual) of 2.659 whose absolute value is closest to 3.

iii) Explain why the point selected in part (ii) is not the same as in part (i). It may be helpful to refer to the plot below.

Since the first point is more in line with a quadratic line while the second is more attuned to the straight line....

By purchasing this solution you'll be able to access the following files:
Solution.docx.

$45.00
for this solution

or FREE if you
register a new account!

PayPal, G Pay, ApplePay, Amazon Pay, and all major credit cards accepted.

Find A Tutor

View available Advanced Statistics Tutors

Get College Homework Help.

Are you sure you don't want to upload any files?

Fast tutor response requires as much info as possible.

Decision:
Upload a file
Continue without uploading

SUBMIT YOUR HOMEWORK
We couldn't find that subject.
Please select the best match from the list below.

We'll send you an email right away. If it's not in your inbox, check your spam folder.

  • 1
  • 2
  • 3
Live Chats