1. Explain why r must be between -1 and +1. Please, do not use more than 1 or 2 paragraphs. You can append calculations, if you need them.

2. Say you run a simple regression with predictor variable X₁ and outcome variable Y. You fit the following model: Y = B₀ + B₁X₁

a) What is the interpretation of B₀? What is the interpretation of B₁?

b) When will B₁ be equal to the correlation between Y and X₁? Why?

After running the analysis, you remember a covariate that you believe is related to Y, but is not substantively of interest.

c) What are the benefits of including the covariate in the model? Include two benefits and explain them in detail.

Finally, you include the covariate within the analysis and fit the following model: Y = B₀ + B₁X₁ + B₂X₂

d) What is the interpretation of each of the coefficients in this model

e) When will B₁ be equal to the correlation between Y and X₁? When will B₂ be equal to the correlation between Y and X₂? Why?

3. Suppose you are hired to serve as a statistical consultant. In each of the following cases, what advice (if any) would you give to your client concerning the procedures and/or conclusions he or she has drawn, or about the kind of statistical techniques most suitable? Be sure to briefly explain the reasoning underlying your advice.

(a) A researcher studies the effects of education (High School or less, Some College, 4 Year College Degree, Graduate/Professional Degree) on income by randomly calling 5,000 participants in the United States. At a presentation of his results, several colleagues suggest that effects of education on income may not be robust when considering other predictors such as work experience, time with their current employer, age, and personal investments. What sort of analysis did the researcher conduct, and how can the researcher address these criticisms of his research?

(b) A researcher is interested in predicting the mental health of college students based on their reported level of stress. What kind of sample should she collect and what statistical technique should she use to achieve this goal?

(c) A researcher collected data from undergraduate and graduate students at universities across the country in a study of the relation between age (Range of 18 – 46 years with a Mean = 23.5) and openness. There was a significant, negative relation between age and openness (r(2,998) = - 0.13, p < .05). The researcher cited this finding as evidence for why elderly individuals (age 60 years and upward) have difficulty learning about novel technology and embracing ideological shifts—their openness has declined substantially over their lives. Is this a reasonable conclusion? Why or why not?

(d) A researcher studied a group of 100 students by having them complete a survey once a quarter, every quarter, for two years via an online survey form. The survey consisted of several items meant to measure anxiety, self-competence, and academic performance. What methods of analysis would be applicable to this type of data? How do you justify your recommendations?

(e) A researcher received a small grant to conduct a study and is debating on how to spend the money. She is thinking that she can either give a test to 300 individuals on one occasion, give a test to one individual on 300 occasions, give a test to 30 individuals on 10 occasions, give a test to 10 individuals on 30 occasions, or any combination of the above. Which of these data collection methods should she use and why?

Extra

Explain what it means to say that a correlation is a covariance expressed in z-scores? Derive numerically the formula for a correlation based on the formula from a covariance (and describe the steps in your own words).

**Subject Mathematics Advanced Statistics**