 # Statistics Questions

Subject Mathematics Statistics-R Programming

## Question

Part I: R
1) The dataset sleep (obtained by typing data(sleep) in R) is a data frame with 20 observations on 3 variables. Type help(sleep), and carefully read the description of the dataset. We wish to test whether the effects of two drugs are different.
a) Use a t-test. Comment on the assumptions of the t-test. (Provide supporting graphics if needed.)
b) Use a nonparametric test.

2) Read the data contained in college.txt into R.
a) Check that you have read 260 colleges and 11 variables (including the school name).
b) The variable Full.time contains the percentage of faculty who are hired full-time. We will focus on the comparison of the full-time faculty percentages between the four tiers of colleges (variable Tier).
i) Construct parallel boxplots to compare the distributions of Full.time within each level of Tier.
ii) Compute the means and standard deviations of Full.time within each level of Tier. Discuss any potential differences and/or interesting patterns in the text.
iii) Test whether the variances are equal.
iv) Use a one-way ANOVA model to test whether the full-time percentages are different for different tiers. If the means are different, perform pairwise comparisons and discuss.
c) Estimate the following linear regression models:
Model 1: predict Alumni.giving by Enrollment
Model 2: predict Alumni.giving by Enrollment and Grad.rate.
i) Request an F-test to compare the 2 models (i.e. Does including Grad.rate result in significant improvement in prediction over Model 2?). Hint: make sure to only use complete cases when building both models.
ii) Chose the ‘best’ model and interpret the estimated coefficients and r-squared value (be sure to discuss why your chosen model is the best model).

Part II: SAS
The Auto MPG Data Set can be found at U.C. Irvine Machine Learning Repository at https://archive.ics.uci.edu/ml/datasets/Auto+MPG (or simply google “Auto MPG Data Set”). Read the data set information and download the data file auto-mpg.data.
1. Use SAS to read this dataset. We will only use the first six variables (from mpg to acc.).
2. Inspect the data by plotting a scatterplot matrix. Discuss the graphs.
3. Fit a regression model for prediction of ‘mpg’ by the other five variables. Report the variation inflation factors and explain (ignore other assumptions for this task).
4. Perform a model selection procedure (you can choose which procedure you use) for prediction of ‘mpg’, and suggest your final model.
• Report all of appropriate SAS programs used in obtaining your final model, but only include the outputs from the final model fit and diagnostic plots.
• Discuss any issues (e.g., potential outliers and need for transformation) you have identified in the analysis. (For this assignment, you do not have to attempt any remedies for these issues, but please discuss them).

## Solution Preview

This material may consist of step-by-step explanations on how to solve a problem or examples of proper writing, including the use of citations, references, bibliographies, and formatting. This material is made available for the sole purpose of studying and learning - misuse is strictly forbidden.

# 1) The dataset sleep (obtained by typing data(sleep) in R) is a data frame with 20 observations on 3 variables. Type help(sleep), and carefully read the description of the dataset. We wish to test whether the effects of two drugs are different.
# a) Use a t-test. Comment on the assumptions of the t-test. (Provide supporting graphics if needed.)

t.test(sleep\$extra~sleep\$group)
# We can see that the p-Value is greater than 0.05, so according to the t-test there's a significant diference in both the group means of group 1 and group 2
# Assumptoion of the t-test is that data should follow normal distribution, let's draw a graph and see if data is from normal distribution, we'll check this using qq-plot
qqnorm(sleep\$extra);qqline(sleep\$extra, col = 2)

# We can see that data points are diverting from the straight line, so this data does not come from a normal distribution, which is violation of the t-test assumption. hence, we cannot use t-test.

# b) Use a nonparametric test.

# using the independent Mann-Whitney U Test which is alternative of t-test
wilcox.test(sleep\$extra~sleep\$group)

# Here we can see that p-value is still greater than 0.05 we can say that there is no significant diference between two group means....

This is only a preview of the solution. Please use the purchase button to see the entire solution

## Related Homework Solutions

Statistics Questions \$30.00
Statistics
Mathematics
R Programming
Data Sets
Means Test
Tables
Differences
Degrees Of Freedom
Coefficient Test
Correlation
Critical Values
T-Statistics
Variables
Calculations
Applied Statistics Questions \$150.00
Applied Statistics
Mathematics
Profile Analysis
ND Data
Transformations
Treatment Effect
Probability
P-Value
Functions
Research
Applied Statistics Questions \$40.00
Statistics
Mathematics
Drinks
Alcohol Consumption
Data Set
Matrices
MVN Diagnostics
Transformations
Null Distribution
Decisions
P-Values
Functions
Statistics Report \$90.00
Mathematics
Statistics
R
Programming
Nearest
Neighbors
QDA
LDA
FDA
Classification
Trees
Statistics & R-Programming Questions \$20.00
Statistics
Mathematics
R-Programming
Division
Remainder
Binary Arithmetic
Computation
Codes
Functions
Statements
Variables
Digits
Cases
Statistics Questions \$18.00
Statistics
Mathematics
Variables
Data Sets
Analysis
Pie Chart
Bar Graph
Dot Plot
Box Plot
Steam and Leaf Graph
Standard Deviation
5-Number Summary
Mean
Standard Deviation
Samples
Interquartile Range
Mode
GPA
Students
Outliers
Live Chats