1. For this problem, use the data set included that contains information on the 50 American states and Washington, D.C. in four randomly selected years. The variables in the data set in which we are interested are:
-crime: the crime rate in a state in a given year
-unemploy: the unemployment rate in a state in a given year
-divorce: the divorce rate in a state in a given year
-afdc: the amount of funding that state received in a given year as part of the Aid to
Families with Dependent Children (AFDC) program
-birth: the birth rate in a state in a given year
-watchlist: a dichotomous variable that indicates whether a state was on a federal watch
list because of an above average crime rate, coded 1 if yes and 0 if no.
Using statistical software, first estimate a regression of crime on unemploy, divorce, afdc, and birth. Paste the output below. Then do/answer the following:
* Obtain variance inflation factors (VIFs) for each independent variable. What are the four VIFs, and what is their interpretation? Would you conclude that collinearity is a problem in the model?
* Predict the residuals, and graph their density (using a histogram or a density plot).
Paste your graph below. Do the residuals look approximately normally
* Test whether the residuals are heteroscedastic. That is, conduct a test of the null hypothesis of constant error variance. Would you conclude that the residuals are heteroscedastic? If so, how might you address this?
* Each state is observed multiple times in the data set. Which OLS assumption will this clustering * Check for any outliers in the regression with regard to unemployment by creating
DFBETAs. Did you find any outliers? How might you address them?
* Check for any outliers in the regression with regard to the overall fit of the model by creating DFFITs. Did you find any outliers? How might you address them?
* Check for nonlinearity with regard to the ADFC variable. If you found nonlinearity, address it appropriately. What steps did you take to address the potential nonlinearity, and how did your findings change? Paste any new regression output below.
Now use watchlist as the dependent variable. You should not use OLS because it is a categorical variable, so instead estimate a logistic regression of watchlist on unemploy, divorce, afdc, and birth. Paste the output below. Then do/answer the following:
* Which of the four variables would you conclude have a statistically significant effect on the likelihood of being on the watch list? Why?
* By what factor does a unit change in the divorce rate change an observation’s odds of being on the watch list, all else being equal.
This material may consist of step-by-step explanations on how to solve a problem or examples of proper writing, including the use of citations, references, bibliographies, and formatting. This material is made available for the sole purpose of studying and learning - misuse is strictly forbidden.
* The OLS assumption includes the independent observation. If observations for the same states are anyway related which they will be, then a nice set of properties of OLS estimators is not guaranteed. The regression estimation carried out does not contain a variable that indicates each state, error terms contain the individual state characteristics that accounts for the crime rate. These error terms also are likely related to independent variables in observations for the state. Therefore, the model suffers from the omitted variable bias. The easy way to circumvent this problem is to include the categorical variable to indicate a state in the model....
This is only a preview of the solution. Please use the purchase button to see the entire solution