QuestionQuestion

Transcribed TextTranscribed Text

1. Decision Trees This question relates following figure. 15 X2 o 3 10 X1k1 x2<<2 o X1/0 2.49 -1.00 0.63 -1.06 0.21 (2) Sketch the ofthe illustrated the left-hand panel of the figure above The the bonces indicate the mean of within nach region (b) Create diagram similar figure using tree illustrated in the right-hand should divide the predictor space into the correct regions, and indicate the mean each region 2. Regression Trees the lah, converting Sales into qualitative response variable trees and related approaches, treating (a) Split the data set intoz (b) regression tree othe Plot tree and interpret the results What test MSE do you obtain? (c) validation the optimal level of tree complexity Dors pruming the tree improve the test MSE? (d) Use the bagging approach order this data. What test MSE do you obtain? Ust the importance(] function important. (e) random forests obtain? Use the importance the effect of the number of sariables considered 3. Classifica Trees (13% 20%) This soblem involves the oj data which part ISLA package. (a) Create training containing randon the remaining observations (b) Fit treeto training data, with Purchase the response and the ther variables predictors summary( function about describe the results obtained What training error rate? How many terminal nodes dons the tree have? (c) Type the name o the tree object order toget text output. Pick one the terminal nodes and interpret the information displayed (d) Create plot the tree, and intespret sersults (e) Prodict the responge he test data, and matrix comparing the test labels the predicted test labels What in the test erros rate? (i) Apply the ev .tree<) function to the training deternine the optimal tree size (g) Produce plot with tree size or x-axis and error rate the y-axis. (h) Which tree size corresponds to the lowent error rate? (i) Produce pruned tree corresponding to the optimal tree size obtained using crom validation Il validation does not lead o selection f a pruned tree them create pruned tree with five terminal modes. (i) Compare the training error rates betweem the prumed and umpruned trees. Which higher (k) Compare the test error rates between the pruned and amprused trees Which higher? 4. SVM (13% 20%) problem, you will use support vector whether given car geta high or mileage based the Auto (a) Create binary variable that takes for the median and 0 for cars milerage below median (b) Fit : support rector tothe data with predict whether car Report the associated with different values this parameter (c) Now repeat (b), with radial basis kernels with differeat value of gamma degree cost. Comment your results. (d) Make plota back your assertions Hint the lab, function in cases with When you can use the plot () function to create plota displaying variables time. Essentially instead ftyping plot (svafit dat) where svinfit your your data, you can type in order replace with the correct variable SVM (13% 0%) toy observations Obs. 3 Red 2 Red Red Red Blue Bluc Blue Sketch observations. (b) Sketch optimal separating hyperplame and provide the equation for this hyperplane the following form (c) Describe the maximal margin should something along the lines Classify Red and otherwise.) Provide the values for 80 31, (d) Oa your indicate for the (e) Indicate the support for maximal (f) Argue that slight affec the maximal margin hyper lane (g) Sketch hyperplane and provide the equation for this hypesplane (h) Draw additional longer separable by hypesplame. 6. Hierarchical (10% 20%) Consider the states. (a) Using hierardical clustering with complete linkcage and Euclidean distance, cluster the states. (b) Cut the dendr ogram height that results three distinct clusters. Which states belong to which clusters? (c) Hierarchically chuster the state using linkage and distance after scaling sariables have standard deviation (d) What effect does scaling the variables have the hierarchical clustering obtained? In your opinion should before the inter observation lissimilarities are computed? Provide justification answer 7. PCA and K-Meats Clustering (20% 0%) In problem, you will generate simulated data, and ther perform PCA and K-means clustering the data. (a) Generate simulated data with 20 observation each classes (i.e sations total). and variables Hint: There number functions n B that you can use gemerate data Ome example the morm(] function runif() another option Besuret adda mean shift the observations naach class that there three distinct classes. (b) Perform PCA observations and plot the first principal esgenvector Use diferent color othe three classes. appear separated plot, then continue then the - - there ration classes. Do part continie modify greater separ between part until the three classes show the principal component (c) Perform -means dustering observations with -3 How well the clusters that you obtained K-means clustering compare the true class labels? Hint You can sse the table< function commane the true class labels tothe class labels obtained by clustering careful pou interpret results -means clustering arbitrarily number the clusters. simply check Avel labels clustering labels the same (d) Perform (-means clustering with Describe your (e) Now perform -means clustering K=4 and describe your results (i) Nom perform -means clustering with principal components rather than the data. That perform K-menas matrix of which the finst column Greet principal component' eigenvector and second column sthe secoad principal correspoadia eigenvector Comment results (g) Using scaleO function, the data after scaling each have ome. How results compare those obtained (b)?

Solution PreviewSolution Preview

These solutions may offer step-by-step problem-solving explanations or good writing examples that include modern styles of formatting and construction of bibliographies out of text citations and references. Students may use these solutions for personal skill-building and practice. Unethical use is strictly forbidden.

output:
pdf_document: default
html_document: default
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
# Loading essential libraries needed
# for this assignment
library(ISLR)
library(rpart)
library(randomForest)
library(tree)
library(caret)
library(e1071)
```

## 1. Decision Trees

### (a)

![](1_a.png)


### (b)

![](1_b.png)

## 2. Regression Trees

### (a)

```{r}
# Setting seed
set.seed(1234)
# Splitting the data into 75% training set and remaining 25% as test set.
sample_size <- floor(0.75 * nrow(Carseats))
# Randomly taking 75% samples from the overall dataset.
train_index <- sample(seq_len(nrow(Carseats)), size = sample_size)
# Picking training rows from the sample indexes
train <- Carseats[train_index, ]
# Picking remaining rows as test data
test <- Carseats[-train_index, ]
```


### (b)

```{r}
# Fitting decision tree on the training dataset
fit <- rpart(Sales ~ ., data=train)

# Priting fitted model
fit

# Plotting regression tree
plot(fit, main = "Regression Tree")
text(fit, use.n=TRUE, all=TRUE, cex=.3, cey=0.2)

# Predicting values on test dataset
fitted <- predict(fit, test)

# Calculating Mean Squared Error (MSE)
MSE <- mean((test$Sales - fitted)^2)

# Printing MSE
cat("Test MSE is:", MSE)
```

First decision node is ShelveLoc, if its value is Bad or Medium, then left subtree is checked for decision, otherwise, right subtree is checked for next rule. Similarly, if left sub-tree is selected, then the next factor to consider is Price, ie. if price is above 10.5.5, then decision tree takes the left subtree path, otherwise, right subtree. Following this, anytime we reach at the leaf node, the value of the node is the predicted value based on test values.

### (c)

```{r}

# Plotting cross validation to get optimal tree size
plotcp(fit)
par(mfrow=c(1,2))
rsq.rpart(fit)

# Pruning model with optimal level
prune_fit <- prune(fit, cp = fit$cptable[which.min(fit$cptable[,"xerror"]),"CP"])

# Replotting decision tree
plot(prune_fit, uniform=TRUE,
   main="Pruned Regression Tree")
text(prune_fit, use.n=TRUE, all=TRUE, cex=.4)

fitted <- predict(prune_fit, test)

# Calculated MSE for the pruned tree
MSE <- mean((test$Sales - fitted)^2)
cat("Test MSE after pruning is:", MSE)
```
Test MSE after pruning is almost remains same. Hence, pruning doesn't not seem to be effect in this case...

By purchasing this solution you'll be able to access the following files:
Solution.pdf and Solution.Rmd.

$75.00
for this solution

or FREE if you
register a new account!

PayPal, G Pay, ApplePay, Amazon Pay, and all major credit cards accepted.

Find A Tutor

View available Statistics-R Programming Tutors

Get College Homework Help.

Are you sure you don't want to upload any files?

Fast tutor response requires as much info as possible.

Decision:
Upload a file
Continue without uploading

SUBMIT YOUR HOMEWORK
We couldn't find that subject.
Please select the best match from the list below.

We'll send you an email right away. If it's not in your inbox, check your spam folder.

  • 1
  • 2
  • 3
Live Chats