## Transcribed Text

Applied Regression Analysis
Theory/numerical part
1 Consider multiple linear regression model y= xB +o with c~N(0, and the hat
matrix H=x(x7x) Prove that the variance covariance matrix the vector residuals
¿is Var(2) H). Show the computation entirety (Hint remember that you can
write f=y-y=(1. H)y, and use the fact that Var(Ay) Avar(y)A¹ for every
matrix constants A).
2. You want to fit : multiple linear regression model y= x8 to predict the response
y
using two predictors x and xz. You consider sample of points, shown the following
Routput
: y=c(0,-1,-1,0)
>
The columns of the design matrix are linear dependent hence there are infinitely many
solutions to normal equations x7x8 xyy Compute these infinitely many solutions
(show the computation its entirety)
Applied part
Consider again the dataset contained in "BODY FAT.TXT". Please read carefully the
description of this dataset and the guidelines for this part the homework before performing
the analyses R and writing your results (file Homework dataset guide pdf). Do not exceed 5
pages,
In the following analyses, use the body fat percentage you re -computed Homework1 (there
erroneous values the variable SirBBPerci providedir the dataset). lfi n Homework1 you
found some units (men) which some the contair obviou: mistakes, remove
them. for find other very extreme values (outliers), you can present results with and without
them (provide an appropriate justification for removing units f you do).
Consider again the multiple linear regression model for body fat percentage, with predictors
AbdomenC Weight, Height, NeckC, ChestC. HipC. ThighC. KneeC, AnkleC, BicepsC,
ForearmC WristC
A Compute leverage, studentized residuals and Cook's distance, and produce plots of each of
these three quantities versus the observation number
Identify unusual observations the space outliers with respect to the response
y,
and influential points that strongly influence the model which the
three plots you employed to identify the points, and providing details about the
observations you identified)
.
Evaluate the observations you identified as influential, looking all the measurements
you have dataset (try explain why they are influential decide you want
remove them) you wish remove some observations from the data, justify your
choice and rerun the regression and the diagnostics (also residual Q. plot Shapiro
test..
.
Conclude with general evaluation of the linear regression model you obtained based
all available information
[Hint: use function model matrix to create the design matrix the model, t to compute
the transpose of matrix, %*% to multiply two matrices and solve to compute the inverse
identify unusual observations using function which] B. Consider now the wrist
circumference
.
Fit simple linear regression model with response the body fat percentage and
predictor the wrist circumference in order study the relationship between the two
variables, and produce the regression plot (i.e. scatter plot of vs with the fitted
regression
rimposed)
.
the effect of wrist circumference on body fat percentage positive or negative? Is
significant?
.
Look the coefficient beta corresponding the wrist circumference in the multiple
linear model you fitted part Is the effect of wrist circumference on body fat
percentage positive
negative?
significant?
Do the coefficients wrist circumference the two regressions indicate relationships
of a different nature/sign? What do you think going on? Explain.
C. Consider the binary variable Over45 as well as the abdomen circumference Fit linear
regression model that takes into consideration both predictors, as well as their interaction
Provide:
Model employed;
The equation the estimated regression line for each c the two age groups
plot body fat on abdomen circumference, with the two fitted equation
lines
superimposed [Hint function ab line add each of the fitted lines];
Coefficient of determination of the model.
D. is there any evidence that the relationship between abdomen circumference and body fat
percentage differs between under45 and over45? Answer this question performing an
appropriate hypothesis test Provide:
Null and alternative hypotheses;
Value the test statistic employed;
Distribution the test statistics under the null hypothesis
-value test;
Conclusion significance
level
5%
Applied Regression Analysis
Dataset guide
The file "BODY FAT TXT contains data adapted from dataset posted by Roger w Johnson
(Dept of Mathematics Computer Science, South Dakota School of Mines Technology)
Johnson also provided the following references, that might be useful to you
.
Bailey Covert (1994). Smart Exercise: Burning Fat, Getting Fit. Houghton Mifflin Co.,
Boston pp. 179- 186.
Behnke A.R. and Wilmore, J.H. (1974) Evaluation and Regulation of Body Build and
Composition Prentice-Hall Englewood Cliffs. N.J
.
Siri. W.E (1956). "Gross composition the body Advances Biological ano Medical
Physics, vol IV. edited by J.H Lawrence and C.A. Tobias, Academic Press, Inc., New
York
Katch. Frank and McArdle, William (1977). Nutrition Weight Control, and Exercise,
Houghton
Mifflin
Co.
Boston.
Wilmore, Jack (1976). Athletic Training and Physical Fitness: Physiological Principles of
the Conditioning Process, Allyn and Bacon Inc. Boston
The
dataset is about sample of 252 men and contains the following variables
Density density the body, determined from underwater weighing
SiriBFperc percentage body fat calculated as function of the Density according to
Siri's equation: (495/Density) 450.
Over45 indicator for age group (0: up to 45 years, over 45)
Weight: weight
Height height (inches)
NeckC: neck circumference (cm)
ChestC: chest circumference (cm)
AbdomenC: abdomer circumference (cm)
HipC hip circumference (cm)
ThighC thigh circumference (cm)
KneeC knee circumference (cm)
AnkleC ankle circumference (cm)
BicepsC biceps circumference (cm)
ForearmC forearm circumference (cm)
WristC wrist circumierence (cm)
The most accurate way of calculating the percentage of body fat the one provided by Siri's
equation (495/Density) 450. which requires measurement Density via weighting under
water This measurement expensive and unpractical. On other hand, age and all thebody
measurements listed above are easy obtain Thus, want understand if we can reliably
describe and predict the percentage of body fat using these other variables, through
a
regression model.
In the dataset, for age we only have binary indicator separating men below and above 45
years. On the other hand. all the body measurements are continuous variables.
Notice also that body fat percentage is a quantity bound to vary between and 100 This could
cause problems when using linear egression models, because we could actually estimate
mean levels predict values smaller than or larger than 100 on certain predictors ranges.
However we can safely use linear regression as move ranges of the predictors
where the fitted values are well above and well below 100.
Throughout the semester the "Application" componenti each of the homework sets will consist
of
employing various modeling, inference and diagnostics techniques learned class on these
data with the final aim of producing satisfactory regression model for the percentage of body
fat based on allo on subsample of the available predictor variables.
NOTE: DENSITY (THE VARIABLE ON WHICH SIRI'S EQUATION IS BASED) WILL NOT BE
USE ASA PREDICTOR
When preparing the "Application" part of each homework, make sure that:
does exceed pages, including figures and tables
The answer each question divided two parts. one devoted to technical details and
outputs, and one devoted interpretation the results. The former can contain output
(only the relevant part of it!) and technical answers to the question The latter should
resemble short report you would write for : client, i.e. be concise and informative but not
contain technical terms, and not assume the reader has statistical knowledge
Keep in mind that some erroneous values were detected in this data set
Density given you so that you can verify whether there were mistakes the calculation
Percentage body fat through Siri's formula
There may be some obvious measure errors Height and other predictor variables.
When performing analyses required for each homework set, you can remove. present
results with and without few units (men) that appear carry erroneous neasurements Never
remove more than units, and always provide an appropriate justification for removing units
if you do (report which units you removed).
Remember that regression model should be satisfactory three aspects
variability explanation;
diagnostics the model assumptions;
parsimony and interpretability

These solutions may offer step-by-step problem-solving explanations or good writing examples that include modern styles of formatting and construction
of bibliographies out of text citations and references. Students may use these solutions for personal skill-building and practice.
Unethical use is strictly forbidden.

```{r setup, include=FALSE}

knitr::opts_chunk$set(echo = TRUE)

```

## Theory 2

$X$ = $\begin{bmatrix}

2 & 4 \\

0 & 0 \\

-1 & -2 \\

1 & 2

\end{bmatrix}$

$X^T$ = $\begin{bmatrix}

2 & 0 & -1 & 1\\

4 & 0 & -2 & 2

\end{bmatrix}$

$X^TX$ = $\begin{bmatrix}

6 & 12 \\

12 & 24

\end{bmatrix}$

Inverse is not solvable as it is a singular matrix.

```{r}

x1 = c(2, 0, -1, 1)

x2 = c(4, 0, -2, 2)

X = cbind(x1, x2)

y = c(0,-1,-1,0)

df = data.frame(X = cbind(x1, x2), y)

fit <- lm(y ~., df)

```

## A

```{r, fig.height=2.5, fig.width=5}

dat <- read.table("BODY_FAT.TXT", header = TRUE)

df <- dat

df$Density <- NULL

fit <- lm(SiriBFperc ~., data = df)

```

```{r, fig.height=2.5, fig.width=5}

plot(hat(model.matrix(fit)), type = "h")

```

```{r, fig.height=2.5, fig.width=6}

par(mfrow=c(1,2))

plot(rstudent(fit), type = "h")

plot(cooks.distance(fit), type = "h")

```

```{r, fig.height=4}

par(mfrow=c(2,2))

plot(fit)

```...