Part I: R

1) The dataset sleep (obtained by typing data(sleep) in R) is a data frame with 20 observations on 3 variables. Type help(sleep), and carefully read the description of the dataset. We wish to test whether the effects of two drugs are different.

a) Use a t-test. Comment on the assumptions of the t-test. (Provide supporting graphics if needed.)

b) Use a nonparametric test.

2) Read the data contained in college.txt into R.

a) Check that you have read 260 colleges and 11 variables (including the school name).

b) The variable Full.time contains the percentage of faculty who are hired full-time. We will focus on the comparison of the full-time faculty percentages between the four tiers of colleges (variable Tier).

i) Construct parallel boxplots to compare the distributions of Full.time within each level of Tier.

ii) Compute the means and standard deviations of Full.time within each level of Tier. Discuss any potential differences and/or interesting patterns in the text.

iii) Test whether the variances are equal.

iv) Use a one-way ANOVA model to test whether the full-time percentages are different for different tiers. If the means are different, perform pairwise comparisons and discuss.

c) Estimate the following linear regression models:

Model 1: predict Alumni.giving by Enrollment

Model 2: predict Alumni.giving by Enrollment and Grad.rate.

i) Request an F-test to compare the 2 models (i.e. Does including Grad.rate result in significant improvement in prediction over Model 2?). Hint: make sure to only use complete cases when building both models.

ii) Chose the ‘best’ model and interpret the estimated coefficients and r-squared value (be sure to discuss why your chosen model is the best model).

Part II: SAS

The Auto MPG Data Set can be found at U.C. Irvine Machine Learning Repository at https://archive.ics.uci.edu/ml/datasets/Auto+MPG (or simply google “Auto MPG Data Set”). Read the data set information and download the data file auto-mpg.data.

1. Use SAS to read this dataset. We will only use the first six variables (from mpg to acc.).

2. Inspect the data by plotting a scatterplot matrix. Discuss the graphs.

3. Fit a regression model for prediction of ‘mpg’ by the other five variables. Report the variation inflation factors and explain (ignore other assumptions for this task).

4. Perform a model selection procedure (you can choose which procedure you use) for prediction of ‘mpg’, and suggest your final model.

• Report all of appropriate SAS programs used in obtaining your final model, but only include the outputs from the final model fit and diagnostic plots.

• Discuss any issues (e.g., potential outliers and need for transformation) you have identified in the analysis. (For this assignment, you do not have to attempt any remedies for these issues, but please discuss them).

**Subject Mathematics Statistics-R Programming**