QuestionQuestion

## Problem 1: Colleges and Universities

In this problem, you will do further cleaning and analysis of the data from the 1995 US News and World Report on colleges and universities in the US.

a. Read the modified data into R. If using `read_csv`, watch out for leading spaces in column names. Check the first few values of each vector to ensure that they were read correctly.

b. Examine the summary of each variable. Identify any unrealistic values and set them to missing.

- It may be helpful to use control flow or functions to help organize your work.

**Write a sentence** describing what you did, naming the colleges or universities affected. (For example, "Listed ages less than zero (ABC University, XYZ College) were converted to missing data.")

c. Find the mean percentages of alumni who donate, for each of private and public schools.

d. Test whether there is evidence that a higher percentage of alumni from private schools donate to their schools, compared to alumni from public schools.

- Hint: In part c, we took the mean of this variable in each group. What does this tell you about what type of hypothesis to use?

**State your conclusion** in context.

e. Use `write_csv()` or `write.csv()` to save your updated data set.

- If you are using `write.csv()`, consult the R documentation to set the arguments for the function. Your output file should not have row names or row numbers.
- After you save the file, open it in Excel, Notepad++, or a program of your choice to verify that the version you saved contains the updates you made in part b.

## Problem 2: Current Population Survey

The data set cps.csv contains data from the 1985 Current Population Survey.

- Dataset: "Wages from the Current Population Survey", from Daniel Kaplan, *Statistical Modeling: A Fresh Approach.* Original source: Berndt, ER. *The Practice of Econometrics 1991.* Addison-Wesley.

- Metadata: cps_metadata.pdf, from p. 418 of *Statistical Modeling: A Fresh Approach* by Daniel Kaplan.

a. Read the data into R and plot wages versus education.

**Comment** on the appropriateness of linear regression.

b. Perform the linear regression and examine the diagnostic plots.

**Explain** why transforming the wages variable is a good idea in this case.

c. The variable `wage` has units of dollars/hour. Create a new variable, `time`, equal to 1/`wage`. (So `time` has units of hours/dollar, or the length of time a person must work to earn $1.00.)

d. Plot time versus education.

Comment on the appropriateness of linear regression.

e. Perform the linear regression.

Based on these results, are you happy with your decision to pursue a master's degree? **Explain.**

f. Examine the diagnostic plots.

**Which individuals** appear to be outliers on the residual vs. predicted plot?

Re-do the regression without those individuals.

**Did excluding** the outliers change your conclusion?

Solution PreviewSolution Preview

This material may consist of step-by-step explanations on how to solve a problem or examples of proper writing, including the use of citations, references, bibliographies, and formatting. This material is made available for the sole purpose of studying and learning - misuse is strictly forbidden.

## Problem 1: Colleges and Universities

In this problem, you will do further cleaning and analysis of the data from the 1995 US News and World Report on colleges and universities in the US.

a. Read the modified usnews data into R. If using `read_csv`, watch out for leading spaces in column names. Check the first few values of each vector to ensure that they were read correctly.

```{r}
usnews <- read.csv("usnews2.csv")
```

b. Examine the summary of each variable. Identify any unrealistic values and set them to missing.

- It may be helpful to use control flow or functions to help organize your work.
```{r}
summary(usnews)
names(usnews)
#par(mfrow = c(4, 4))
for (i in 1:ncol(usnews)) {
variable <- usnews[,i]
name <- names(usnews)[i]
if (class(variable)!="factor") {
    hist(variable, main=name)
}
}

usnews[which(usnews$Student.faculty.ratio>60), 1:3]
usnews[which(usnews$Graduation.rate>100), 1:3]

usnews[which(usnews$Student.faculty.ratio>60),c("Student.faculty.ratio")] <- NA
usnews[which(usnews$Graduation.rate>100),c("Graduation.rate")] <- NA
```

**Write a sentence** describing what you did, naming the colleges or universities affected. (For example, "Listed ages less than zero (ABC University, XYZ College) were converted to missing data.")

For two colleges (St. Leo College, Northwood University), student faculty ratio
is set to NA because it is bigger than 60 which is unrealistic.
Because Cazenovia College has a graduation rate greater than 100%, it is set to NA.

c. Find the mean percentages of alumni who donate, for each of private and public schools.
```{r}
aggregate(Pct.alumni.who.donate~Public.private, usnews, mean)

```...
$40.50 for this solution

PayPal, G Pay, ApplePay, Amazon Pay, and all major credit cards accepted.

Find A Tutor

View available Programming (Dynamic, Linear, Non-linear, etc.) Tutors

Get College Homework Help.

Are you sure you don't want to upload any files?

Fast tutor response requires as much info as possible.

Decision:
Upload a file
Continue without uploading

SUBMIT YOUR HOMEWORK
We couldn't find that subject.
Please select the best match from the list below.

We'll send you an email right away. If it's not in your inbox, check your spam folder.

  • 1
  • 2
  • 3
Live Chats