QuestionQuestion

Transcribed TextTranscribed Text

1. The property taxes in a neighborhood are recorded along with the price, size, and the number of bedrooms in the following table: The regression model (reg) is: TAX = B0 + B1xPRICE + B2xSIZE + Error i. Fill in the missing values in the following Anova table of the multiple regression model. ii. Would you consider Price and Tax in the final regression model. iii. Compute R-squared and adjusted R-squared. Analysis of Variance Table --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 HOUSE SIZE BR PRICE TAX 1 1 18 3 80 3 2 2 20 3 95 4 3 32541047 4 42241106 5 5 335 17510 6 6 19 4 85 4 7 7 17 3 89 2 > reg=lm(TAX~PRICE+SIZE,data=house) > anova(reg) Response: TAX Df Sum Sq Mean Sq F value Pr(>F) PRICE ? 36.625 36.625 ? *** SIZE 1 6.880 ? ? *** Residuals ? 1.352 ? 2. The Income (in $1000) for employees in a city are recorded along with the Years of experience, Tax (in $1,000), and level of Education which is a numerical variable (1= High school diploma, 2=BS, 3 =MS, and 4=PhD) in the following table. > salary Income Years Tax Education [1,]8053 2 [2,]95104 3 [3,]10487 4 [4,] 110 12 6 4 [5,] 175 20 10 3 [6,]4574 1 [7,]8992 3 [8,]5261 1 [9,]6343 2 [10,] 78 11 5 3 a) Fill in the missing values in the following Anova table of the multiple regression model for Income (response). b) Propose a regression model for Income (response), given the following summary and Anova table of the multiple regression model. What factor(s) you will choose for your proposed model. c) If we assume Education is a categorical variable rather than numerical, write a regression model with Income as response, and Years and Education as factors (Write the model in detailed stat notatio). > regIncome=lm(Income~Years+Tax+Education) --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > summary(regIncome) Call: lm(formula = Income ~ Years + Tax + Education) --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > anova(regIncome) Analysis of Variance Table Response: Income Df Sum Sq Mean Sq F value Pr(>F) Years ? 9510.1 ? Tax ?602.8? Education 1 417.4 ? Residuals 6 1750.6 ? Residuals: Min 1Q Median 3Q Max -24.184 -5.909 1.584 7.266 19.987 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 11.047 15.985 Years Tax Education 4.552 2.108 3.509 3.957 7.839 6.554 Residual standard error: 17.08 on 6 degrees of freedom Multiple R-squared: 0.8575, Adjusted R-squared: 0.7862 F-statistic: 12.03 on 3 and 6 DF, p-value: 0.005987 3. The Income (in $1000) for employees in a city are recorded along with the Years of experience, Tax (in $1,000), and level of Education which is a numerical variable (1= High school diploma, 2=BS, 3=MS, and 4=PhD) in the following table. > salary Income Years Tax Education [1,]8053 2 [2,]95104 3 [3,]10487 4 [4,] 110 12 6 4 [5,] 175 20 10 3 [6,]4574 1 [7,]8992 3 [8,]5261 1 [9,]6343 2 [10,] 78 11 5 3 The correlations between the independent variables are given as follow: Sum of squares for Years, Tax, and Education are given as: SS(Years) = 21.06667 SS(Tax) = 62.5 SS(Education) = 10.4 a. Find the covariance matrix of the independent variables (Years, Tax, Education)? b. Write the equations that you use to fill in the values in the covariance matrix. 4. Different treatments are given to the patients and the following one-way Anova table is generated based on the results. > cor(Years,Income) [1] 0.8799886 > cor(Years,Tax) [1] 0.8083948 > cor(Years,Education) [1] 0.4909312 Source Treatments Error Total df df (treat) = ? df (error) = ? df (total) = ? SS SST = ? SSE = 1000 SSY = 1500 MS MST = ? MSE = 20 F P-Value 5 0.045 a. Fill in the missing values in the table. b. Would you reject the null hypothesis at 4% level? If so what do you conclude? 5. Model the response (mpg) in “mtcars” data (among R datasets) with regard to the factors cyl, hp, and wt using multiple regression based on the following procedure: (Visualize the results in each part and include in your word file. Attach your R script.) a) Identify the most relevant factor (a single factor) among cyl, hp, and wt using simple linear regression. b) Order the factors based on their importance regarding explaining the variation of the response (mpg) and perform a multiple regression with factors in determined order. c) Compare partial F values in part b with F values obtained in part a. d) Choose a model based on collective goodness of fit criteria including F-value of the Model, Adj R2, and C(p). The selected model will have one or multiple factors among cyl, hp, and wt. e) Visualize the correlation matrix for mpg, cyl, hp, and wt. 6. Use “Income.csv” dataset (posted in Data folder) and perform analysis of variance (ANOVA) based on the following procedure: (Visualize the results in each part, and put them in your word file. Attach your R script.) i ii iii i ii a) One-way ANOVA to compare mean salary based on: . Race . Gender . Write the null and alternative hypotheses for parts i and ii, state whether you reject H0 or not, what is your test statistics, what is its value and its degrees of freedom. b) ANOVA using two factors: . Race and Gender . Write the null and alternative hypotheses, state whether you reject H0 or not. c) Identify the groups with different means in a and b using pairwise t-test. d) Which one of race or gender can better explain differences in the salary? 7. Use “house.txt” dataset (posted in Data folder) and perform k-means clustering based on the following procedure: (Visualize the results in each part and put them in your word file. Attach your R script.) a) Use the price and cluster the data in two groups. b) Use the size (SqFeet) and cluster the data in two groups. c) Use the price and size and cluster the data in two groups. d) Are the results in a and b similar? e) Find Principal Components (PCs) for this dataset. f) Select the PCs that can explain 80% of the variance. g) Cluster the data using the PCs identified in part f.

Solution PreviewSolution Preview

These solutions may offer step-by-step problem-solving explanations or good writing examples that include modern styles of formatting and construction of bibliographies out of text citations and references. Students may use these solutions for personal skill-building and practice. Unethical use is strictly forbidden.

    By purchasing this solution you'll be able to access the following files:
    Solution.pdf and Solution.zip.

    $85.00
    for this solution

    or FREE if you
    register a new account!

    PayPal, G Pay, ApplePay, Amazon Pay, and all major credit cards accepted.

    Find A Tutor

    View available Mathematics - Other Tutors

    Get College Homework Help.

    Are you sure you don't want to upload any files?

    Fast tutor response requires as much info as possible.

    Decision:
    Upload a file
    Continue without uploading

    SUBMIT YOUR HOMEWORK
    We couldn't find that subject.
    Please select the best match from the list below.

    We'll send you an email right away. If it's not in your inbox, check your spam folder.

    • 1
    • 2
    • 3
    Live Chats