## Transcribed Text

1. The property taxes in a neighborhood are recorded along with the price, size, and the number of bedrooms in the following table:
The regression model (reg) is: TAX = B0 + B1xPRICE + B2xSIZE + Error
i. Fill in the missing values in the following Anova table of the multiple regression
model.
ii. Would you consider Price and Tax in the final regression model.
iii. Compute R-squared and adjusted R-squared.
Analysis of Variance Table
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
HOUSE SIZE BR PRICE TAX
1 1 18 3 80 3
2 2 20 3 95 4
3 32541047
4 42241106
5 5 335 17510
6 6 19 4 85 4
7 7 17 3 89 2
> reg=lm(TAX~PRICE+SIZE,data=house)
> anova(reg)
Response: TAX
Df Sum Sq Mean Sq F value Pr(>F)
PRICE ? 36.625 36.625 ? *** SIZE 1 6.880 ? ? ***
Residuals ? 1.352 ?
2. The Income (in $1000) for employees in a city are recorded along with the Years of experience,
Tax (in $1,000), and level of Education which is a numerical variable (1= High school diploma, 2=BS, 3 =MS, and 4=PhD) in the following table.
> salary
Income Years Tax Education [1,]8053 2 [2,]95104 3 [3,]10487 4 [4,] 110 12 6 4 [5,] 175 20 10 3 [6,]4574 1 [7,]8992 3
[8,]5261 1
[9,]6343 2 [10,] 78 11 5 3
a) Fill in the missing values in the following Anova table of the multiple regression model for Income (response).
b) Propose a regression model for Income (response), given the following summary and Anova table of the multiple regression model.
What factor(s) you will choose for your proposed model.
c) If we assume Education is a categorical variable rather than numerical, write a regression model with Income as response, and Years and Education as factors (Write the model in detailed stat notatio).
> regIncome=lm(Income~Years+Tax+Education)
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> summary(regIncome)
Call:
lm(formula = Income ~ Years + Tax + Education)
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> anova(regIncome)
Analysis of Variance Table
Response: Income
Df Sum Sq Mean Sq F value Pr(>F)
Years ? 9510.1 ? Tax ?602.8? Education 1 417.4 ? Residuals 6 1750.6 ?
Residuals:
Min 1Q Median 3Q Max -24.184 -5.909 1.584 7.266 19.987
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.047 15.985
Years
Tax
Education
4.552 2.108 3.509 3.957 7.839 6.554
Residual standard error: 17.08 on 6 degrees of freedom
Multiple R-squared: 0.8575, Adjusted R-squared: 0.7862
F-statistic: 12.03 on 3 and 6 DF, p-value: 0.005987
3. The Income (in $1000) for employees in a city are recorded along with the Years of experience,
Tax (in $1,000), and level of Education which is a numerical variable (1= High school diploma, 2=BS,
3=MS, and 4=PhD) in the following table.
> salary
Income Years Tax Education [1,]8053 2 [2,]95104 3 [3,]10487 4 [4,] 110 12 6 4 [5,] 175 20 10 3
[6,]4574 1 [7,]8992 3 [8,]5261 1 [9,]6343 2
[10,] 78 11 5 3
The correlations between the independent variables are given as follow:
Sum of squares for Years, Tax, and Education are given as:
SS(Years) = 21.06667 SS(Tax) = 62.5 SS(Education) = 10.4
a. Find the covariance matrix of the independent variables (Years, Tax, Education)?
b. Write the equations that you use to fill in the values in the covariance matrix.
4. Different treatments are given to the patients and the following one-way Anova table is generated based on the results.
> cor(Years,Income)
[1] 0.8799886
> cor(Years,Tax)
[1] 0.8083948
> cor(Years,Education)
[1] 0.4909312
Source Treatments Error
Total
df
df (treat) = ? df (error) = ? df (total) = ?
SS
SST = ? SSE = 1000 SSY = 1500
MS
MST = ? MSE = 20
F P-Value 5 0.045
a. Fill in the missing values in the table.
b. Would you reject the null hypothesis at 4% level? If so what do you conclude?
5. Model the response (mpg) in “mtcars” data (among R datasets) with regard to the factors cyl, hp, and wt using multiple regression based on the following procedure:
(Visualize the results in each part and include in your word file. Attach your R script.)
a) Identify the most relevant factor (a single factor) among cyl, hp, and wt using simple linear
regression.
b) Order the factors based on their importance regarding explaining the variation of
the response (mpg) and perform a multiple regression with factors in determined order.
c) Compare partial F values in part b with F values obtained in part a.
d) Choose a model based on collective goodness of fit criteria including F-value of the Model,
Adj R2, and C(p). The selected model will have one or multiple factors among cyl, hp, and wt.
e) Visualize the correlation matrix for mpg, cyl, hp, and wt.
6.
Use “Income.csv” dataset (posted in Data folder) and perform analysis of variance (ANOVA) based on the following procedure: (Visualize the results in each part, and put them in your word file. Attach your R script.)
i ii iii
i ii
a) One-way ANOVA to compare mean salary based on:
. Race
. Gender
. Write the null and alternative hypotheses for parts i and ii, state whether you reject H0 or not,
what is your test statistics, what is its value and its degrees of freedom.
b) ANOVA using two factors:
. Race and Gender
. Write the null and alternative hypotheses, state whether you reject H0 or not.
c) Identify the groups with different means in a and b using pairwise t-test.
d)
Which one of race or gender can better explain differences in the salary?
7. Use “house.txt” dataset (posted in Data folder) and perform k-means clustering based on the following procedure: (Visualize the results in each part and put them in your word file. Attach your R script.)
a) Use the price and cluster the data in two groups.
b) Use the size (SqFeet) and cluster the data in two groups.
c) Use the price and size and cluster the data in two groups.
d) Are the results in a and b similar?
e) Find Principal Components (PCs) for this dataset.
f) Select the PCs that can explain 80% of the variance.
g) Cluster the data using the PCs identified in part f.

These solutions may offer step-by-step problem-solving explanations or good writing examples that include modern styles of formatting and construction
of bibliographies out of text citations and references. Students may use these solutions for personal skill-building and practice.
Unethical use is strictly forbidden.