1. For this problem, use the data set included that contains information on the 50 American states and Washington, D.C. in four randomly selected years. The variables in the data set in which we are interested are:

-crime: the crime rate in a state in a given year

-unemploy: the unemployment rate in a state in a given year

-divorce: the divorce rate in a state in a given year

-afdc: the amount of funding that state received in a given year as part of the Aid to

Families with Dependent Children (AFDC) program

-birth: the birth rate in a state in a given year

-watchlist: a dichotomous variable that indicates whether a state was on a federal watch

list because of an above average crime rate, coded 1 if yes and 0 if no.

Using statistical software, first estimate a regression of crime on unemploy, divorce, afdc, and birth. Paste the output below. Then do/answer the following:

* Obtain variance inflation factors (VIFs) for each independent variable. What are the four VIFs, and what is their interpretation? Would you conclude that collinearity is a problem in the model?

* Predict the residuals, and graph their density (using a histogram or a density plot).

Paste your graph below. Do the residuals look approximately normally

distributed?

* Test whether the residuals are heteroscedastic. That is, conduct a test of the null hypothesis of constant error variance. Would you conclude that the residuals are heteroscedastic? If so, how might you address this?

* Each state is observed multiple times in the data set. Which OLS assumption will this clustering * Check for any outliers in the regression with regard to unemployment by creating

DFBETAs. Did you find any outliers? How might you address them?

* Check for any outliers in the regression with regard to the overall fit of the model by creating DFFITs. Did you find any outliers? How might you address them?

* Check for nonlinearity with regard to the ADFC variable. If you found nonlinearity, address it appropriately. What steps did you take to address the potential nonlinearity, and how did your findings change? Paste any new regression output below.

Now use watchlist as the dependent variable. You should not use OLS because it is a categorical variable, so instead estimate a logistic regression of watchlist on unemploy, divorce, afdc, and birth. Paste the output below. Then do/answer the following:

* Which of the four variables would you conclude have a statistically significant effect on the likelihood of being on the watch list? Why?

* By what factor does a unit change in the divorce rate change an observationâ€™s odds of being on the watch list, all else being equal.

**Subject Mathematics Advanced Statistics**