Transcribed Text
Problem 2. In this problem and the next, we'll use data of used cars listed for sale at
Edmunds.com to build regression models for predicting used car prices.
a. Read the used car listing data into R. Extract all the listings for a used Honda Accord.
Using the extracted data, build a regression to predict the price of a used Honda
Accord (Yprice) based on its mileage (xmileage). That is, fit the model
Yprice = Bo + B1 xmileage + €
on the Accord data. What is the value of each coefficient Bo,B1? Comment on your
interpretation of these values. How well does the model fit the data (i.e., what is the R²
and RMSE)?
b. Given a used Honda Accord with an odometer reading of 50,000 miles, compute the
model estimated mean price, confidence interval for that mean, and the prediction
interval of the price (You can use the predict. Im function in R). Compute the model
estimated mean price for a Honda Accord with an odometer reading of 300,000
miles. Based on the two results, what is a critical issue in your current regression
model?
C. How would you address the critical issue identified in the previous section? Update
your regression model by implementing your recommendation.
d. Graphically present your regression models from part (a) and part (c). Specifically,
plot the data with a scatter plot, and include two regression lines (with the
corresponding confidence bands), one for each of the models above. Note that this can
be easily achieved in R using ggplot2 with geom_smooth. For example, the model from
part (a) can be plotted with
ggplot (data, aes (x=mileage, y=price) ) + geom_point()
+
geom_smooth(method="lm", formula = y  x)

Problem 3. Continuing from the previous problem, we will further investigate the
regression model for predicting used car prices.
a. Refine your regression model from Problem 2c to include the model year as a
predictor (still using only the Honda Accord data). Compare the R² and RMSE
of
the
revised regression model to your original one (i.e., the one in 2c).
b. Now, fit your model from above (Problem 3a) on the entire dataset, instead of just on
the
extracted Accord data. Compare the R² and RMSE between the two models. What
is the major issue in the new model?
C. Update the model from Problem 3b to address the major issue discovered. Compute
the R² and RMSE of the updated model. Briefly discuss your findings.
Prepare a short report detailing your results. Please submit the following: (1) your report as
a single PDF file; and (2) a single, fully functional R script or markdown file that we can run
to reproduce all the numerical results and plots in your report. We will put the necessary
data file (used_cars_clean.tsv) into the same directory as your script before running it.
These solutions may offer stepbystep problemsolving explanations or good writing examples that include modern styles of formatting and construction
of bibliographies out of text citations and references. Students may use these solutions for personal skillbuilding and practice.
Unethical use is strictly forbidden.