Part I: R
1) The dataset sleep (obtained by typing data(sleep) in R) is a data frame with 20 observations on 3 variables. Type help(sleep), and carefully read the description of the dataset. We wish to test whether the effects of two drugs are different.
a) Use a t-test. Comment on the assumptions of the t-test. (Provide supporting graphics if needed.)
b) Use a nonparametric test.
2) Read the data contained in college.txt into R.
a) Check that you have read 260 colleges and 11 variables (including the school name).
b) The variable Full.time contains the percentage of faculty who are hired full-time. We will focus on the comparison of the full-time faculty percentages between the four tiers of colleges (variable Tier).
i) Construct parallel boxplots to compare the distributions of Full.time within each level of Tier.
ii) Compute the means and standard deviations of Full.time within each level of Tier. Discuss any potential differences and/or interesting patterns in the text.
iii) Test whether the variances are equal.
iv) Use a one-way ANOVA model to test whether the full-time percentages are different for different tiers. If the means are different, perform pairwise comparisons and discuss.
c) Estimate the following linear regression models:
Model 1: predict Alumni.giving by Enrollment
Model 2: predict Alumni.giving by Enrollment and Grad.rate.
i) Request an F-test to compare the 2 models (i.e. Does including Grad.rate result in significant improvement in prediction over Model 2?). Hint: make sure to only use complete cases when building both models.
ii) Chose the ‘best’ model and interpret the estimated coefficients and r-squared value (be sure to discuss why your chosen model is the best model).
Part II: SAS
The Auto MPG Data Set can be found at U.C. Irvine Machine Learning Repository at https://archive.ics.uci.edu/ml/datasets/Auto+MPG (or simply google “Auto MPG Data Set”). Read the data set information and download the data file auto-mpg.data.
1. Use SAS to read this dataset. We will only use the first six variables (from mpg to acc.).
2. Inspect the data by plotting a scatterplot matrix. Discuss the graphs.
3. Fit a regression model for prediction of ‘mpg’ by the other five variables. Report the variation inflation factors and explain (ignore other assumptions for this task).
4. Perform a model selection procedure (you can choose which procedure you use) for prediction of ‘mpg’, and suggest your final model.
• Report all of appropriate SAS programs used in obtaining your final model, but only include the outputs from the final model fit and diagnostic plots.
• Discuss any issues (e.g., potential outliers and need for transformation) you have identified in the analysis. (For this assignment, you do not have to attempt any remedies for these issues, but please discuss them).
This material may consist of step-by-step explanations on how to solve a problem or examples of proper writing, including the use of citations, references, bibliographies, and formatting. This material is made available for the sole purpose of studying and learning - misuse is strictly forbidden.
# 1) The dataset sleep (obtained by typing data(sleep) in R) is a data frame with 20 observations on 3 variables. Type help(sleep), and carefully read the description of the dataset. We wish to test whether the effects of two drugs are different.
# a) Use a t-test. Comment on the assumptions of the t-test. (Provide supporting graphics if needed.)
# We can see that the p-Value is greater than 0.05, so according to the t-test there's a significant diference in both the group means of group 1 and group 2
# Assumptoion of the t-test is that data should follow normal distribution, let's draw a graph and see if data is from normal distribution, we'll check this using qq-plot
qqnorm(sleep$extra);qqline(sleep$extra, col = 2)
# We can see that data points are diverting from the straight line, so this data does not come from a normal distribution, which is violation of the t-test assumption. hence, we cannot use t-test.
# b) Use a nonparametric test.
# using the independent Mann-Whitney U Test which is alternative of t-test
# Here we can see that p-value is still greater than 0.05 we can say that there is no significant diference between two group means....
This is only a preview of the solution. Please use the purchase button to see the entire solution