# Applied Regression Analysis Theory/numerical part 1 Consider mult...

## Transcribed Text

Applied Regression Analysis Theory/numerical part 1 Consider multiple linear regression model y= xB +o with c~N(0, and the hat matrix H=x(x7x) Prove that the variance covariance matrix the vector residuals ¿is Var(2) H). Show the computation entirety (Hint remember that you can write f=y-y=(1. H)y, and use the fact that Var(Ay) Avar(y)A¹ for every matrix constants A). 2. You want to fit : multiple linear regression model y= x8 to predict the response y using two predictors x and xz. You consider sample of points, shown the following Routput : y=c(0,-1,-1,0) > The columns of the design matrix are linear dependent hence there are infinitely many solutions to normal equations x7x8 xyy Compute these infinitely many solutions (show the computation its entirety) Applied part Consider again the dataset contained in "BODY FAT.TXT". Please read carefully the description of this dataset and the guidelines for this part the homework before performing the analyses R and writing your results (file Homework dataset guide pdf). Do not exceed 5 pages, In the following analyses, use the body fat percentage you re -computed Homework1 (there erroneous values the variable SirBBPerci providedir the dataset). lfi n Homework1 you found some units (men) which some the contair obviou: mistakes, remove them. for find other very extreme values (outliers), you can present results with and without them (provide an appropriate justification for removing units f you do). Consider again the multiple linear regression model for body fat percentage, with predictors AbdomenC Weight, Height, NeckC, ChestC. HipC. ThighC. KneeC, AnkleC, BicepsC, ForearmC WristC A Compute leverage, studentized residuals and Cook's distance, and produce plots of each of these three quantities versus the observation number Identify unusual observations the space outliers with respect to the response y, and influential points that strongly influence the model which the three plots you employed to identify the points, and providing details about the observations you identified) . Evaluate the observations you identified as influential, looking all the measurements you have dataset (try explain why they are influential decide you want remove them) you wish remove some observations from the data, justify your choice and rerun the regression and the diagnostics (also residual Q. plot Shapiro test.. . Conclude with general evaluation of the linear regression model you obtained based all available information [Hint: use function model matrix to create the design matrix the model, t to compute the transpose of matrix, %*% to multiply two matrices and solve to compute the inverse identify unusual observations using function which] B. Consider now the wrist circumference . Fit simple linear regression model with response the body fat percentage and predictor the wrist circumference in order study the relationship between the two variables, and produce the regression plot (i.e. scatter plot of vs with the fitted regression rimposed) . the effect of wrist circumference on body fat percentage positive or negative? Is significant? . Look the coefficient beta corresponding the wrist circumference in the multiple linear model you fitted part Is the effect of wrist circumference on body fat percentage positive negative? significant? Do the coefficients wrist circumference the two regressions indicate relationships of a different nature/sign? What do you think going on? Explain. C. Consider the binary variable Over45 as well as the abdomen circumference Fit linear regression model that takes into consideration both predictors, as well as their interaction Provide: Model employed; The equation the estimated regression line for each c the two age groups plot body fat on abdomen circumference, with the two fitted equation lines superimposed [Hint function ab line add each of the fitted lines]; Coefficient of determination of the model. D. is there any evidence that the relationship between abdomen circumference and body fat percentage differs between under45 and over45? Answer this question performing an appropriate hypothesis test Provide: Null and alternative hypotheses; Value the test statistic employed; Distribution the test statistics under the null hypothesis -value test; Conclusion significance level 5% Applied Regression Analysis Dataset guide The file "BODY FAT TXT contains data adapted from dataset posted by Roger w Johnson (Dept of Mathematics Computer Science, South Dakota School of Mines Technology) Johnson also provided the following references, that might be useful to you . Bailey Covert (1994). Smart Exercise: Burning Fat, Getting Fit. Houghton Mifflin Co., Boston pp. 179- 186. Behnke A.R. and Wilmore, J.H. (1974) Evaluation and Regulation of Body Build and Composition Prentice-Hall Englewood Cliffs. N.J . Siri. W.E (1956). "Gross composition the body Advances Biological ano Medical Physics, vol IV. edited by J.H Lawrence and C.A. Tobias, Academic Press, Inc., New York Katch. Frank and McArdle, William (1977). Nutrition Weight Control, and Exercise, Houghton Mifflin Co. Boston. Wilmore, Jack (1976). Athletic Training and Physical Fitness: Physiological Principles of the Conditioning Process, Allyn and Bacon Inc. Boston The dataset is about sample of 252 men and contains the following variables Density density the body, determined from underwater weighing SiriBFperc percentage body fat calculated as function of the Density according to Siri's equation: (495/Density) 450. Over45 indicator for age group (0: up to 45 years, over 45) Weight: weight Height height (inches) NeckC: neck circumference (cm) ChestC: chest circumference (cm) AbdomenC: abdomer circumference (cm) HipC hip circumference (cm) ThighC thigh circumference (cm) KneeC knee circumference (cm) AnkleC ankle circumference (cm) BicepsC biceps circumference (cm) ForearmC forearm circumference (cm) WristC wrist circumierence (cm) The most accurate way of calculating the percentage of body fat the one provided by Siri's equation (495/Density) 450. which requires measurement Density via weighting under water This measurement expensive and unpractical. On other hand, age and all thebody measurements listed above are easy obtain Thus, want understand if we can reliably describe and predict the percentage of body fat using these other variables, through a regression model. In the dataset, for age we only have binary indicator separating men below and above 45 years. On the other hand. all the body measurements are continuous variables. Notice also that body fat percentage is a quantity bound to vary between and 100 This could cause problems when using linear egression models, because we could actually estimate mean levels predict values smaller than or larger than 100 on certain predictors ranges. However we can safely use linear regression as move ranges of the predictors where the fitted values are well above and well below 100. Throughout the semester the "Application" componenti each of the homework sets will consist of employing various modeling, inference and diagnostics techniques learned class on these data with the final aim of producing satisfactory regression model for the percentage of body fat based on allo on subsample of the available predictor variables. NOTE: DENSITY (THE VARIABLE ON WHICH SIRI'S EQUATION IS BASED) WILL NOT BE USE ASA PREDICTOR When preparing the "Application" part of each homework, make sure that: does exceed pages, including figures and tables The answer each question divided two parts. one devoted to technical details and outputs, and one devoted interpretation the results. The former can contain output (only the relevant part of it!) and technical answers to the question The latter should resemble short report you would write for : client, i.e. be concise and informative but not contain technical terms, and not assume the reader has statistical knowledge Keep in mind that some erroneous values were detected in this data set Density given you so that you can verify whether there were mistakes the calculation Percentage body fat through Siri's formula There may be some obvious measure errors Height and other predictor variables. When performing analyses required for each homework set, you can remove. present results with and without few units (men) that appear carry erroneous neasurements Never remove more than units, and always provide an appropriate justification for removing units if you do (report which units you removed). Remember that regression model should be satisfactory three aspects variability explanation; diagnostics the model assumptions; parsimony and interpretability

## Solution Preview

These solutions may offer step-by-step problem-solving explanations or good writing examples that include modern styles of formatting and construction of bibliographies out of text citations and references. Students may use these solutions for personal skill-building and practice. Unethical use is strictly forbidden.

{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)  ## Theory 2$X$=$\begin{bmatrix}
2 & 4 \\
0 & 0 \\
-1 & -2 \\
1 & 2
\end{bmatrix}X^T$=$\begin{bmatrix}
2 & 0 & -1 & 1\\
4 & 0 & -2 & 2
\end{bmatrix}X^TX$=$\begin{bmatrix}
6 & 12 \\
12 & 24
\end{bmatrix}$Inverse is not solvable as it is a singular matrix. {r} x1 = c(2, 0, -1, 1) x2 = c(4, 0, -2, 2) X = cbind(x1, x2) y = c(0,-1,-1,0) df = data.frame(X = cbind(x1, x2), y) fit <- lm(y ~., df)  ## A {r, fig.height=2.5, fig.width=5} dat <- read.table("BODY_FAT.TXT", header = TRUE) df <- dat df$Density <- NULL
fit <- lm(SiriBFperc ~., data = df)


{r, fig.height=2.5, fig.width=5}
plot(hat(model.matrix(fit)), type = "h")


{r, fig.height=2.5, fig.width=6}
par(mfrow=c(1,2))
plot(rstudent(fit), type = "h")
plot(cooks.distance(fit), type = "h")


{r, fig.height=4}
par(mfrow=c(2,2))
plot(fit)
...

By purchasing this solution you'll be able to access the following files:
Solution.pdf and Solution.Rmd.

# 50% discount

Hours
Minutes
Seconds
$45.00$22.50
for this solution

or FREE if you
register a new account!

PayPal, G Pay, ApplePay, Amazon Pay, and all major credit cards accepted.

### Find A Tutor

View available Statistics-R Programming Tutors

Get College Homework Help.

Are you sure you don't want to upload any files?

Fast tutor response requires as much info as possible.