QuestionQuestion

Transcribed TextTranscribed Text

1. Here are four scatterplots with the same X values, but four different Y values. The correlations for these relationships are -0.81, 0.01, 0.46, and 0.87. Which correlation corresponds to each graph, and why? 2. Let’s return to an example we looked at a few months ago: the unusual number of ballots casted for Pat Buchanan in Palm Beach County, FL, in the 2000 election. The data and code for this problem are up on Piazza, but, to give some practice for how we will ask questions on the final, we will provide the output for you here. (i) Here is a plot of the number of votes cast for Ross Perot (a 3rd party candidate with roughly similar positions as Buchanan) in each county in 1996 on the x-axis and the number of votes for Buchanan in 2000 on the y-axis. Palm Beach County is the point labelled “PBC” What does this graph tell us about the relationship between county-level votes for Perot in 1996 and Buchanan in 2000? What might we thus infer about PBC? (ii) Here are the correlations between these two variables, first with all counties, and then excluding PBC. > cor(florida$Perot96, florida$Buchanan00) [1] 0.7162634 > cor(florida$Perot96[florida$county != "PalmBeach"], + florida$Buchanan00[florida$county != "PalmBeach"]) [1] 0.9225874 What explains the differences between these two correlation coefficients? (iii) Here is the output for a regression with the Perot vote as the IV and the Buchanan vote as the DV. > lm(Buchanan00 ~ Perot96, data=florida) Call: lm(formula = Buchanan00 ~ Perot96, data = florida) Coefficients: (Intercept) Perot96 1.34575 0.03592 Interpret the slope coefficient. (iv) PBC had 30,739 Perot votes in 1996. How many votes does this model predict Buchanan would receive in 2000? How does this compare to the 3,407 he actually received? (v) Here is the same regression but excluding PBC. > lm(Buchanan00 ~ Perot96, data=florida, subset=(county != "PalmBeach")) Call: lm(formula = Buchanan00 ~ Perot96, data = florida, subset = (county != "PalmBeach")) Coefficients: (Intercept) Perot96 45.84193 0.02435 Note that while the correlation between these variables is higher when excluding PBC, the slope on the Perot96 variables is less steep when excluding PBC. Why is this? 3. Download the “MPs_problem_set4.R” script from Piazza. The first line loads up a data frame which contains data on people who ran for Parliament in the UK and are now deceased. The margin​ variable gives the difference in vote share between this candidate and the other major party candidate for their election. The ​party​ variable indicates whether they belong to the more right-wing party (“tory”) or the more left-wing party (“labour”), and the ​tory​ variable which my code generates is equal to 1 for tories and 0 for labour. The ​ln.net​ variable is the natural log of their wealth (in British Pounds) at death. We will talk a bit more about why we look at this rather than their wealth itself in class on 11/27, but for the purposes here all that matters is that higher numbers mean the candidate was wealthier when they died. (i) Use the t.test() function to do a difference of means test for whether there is a difference in wealth between the Tory and Labour candidates. (You can use the ​party ​or ​tory​ variable.) Interpret the output: can we say with 95% confidence that one party’s candidates end up wealthier than the others? Which one? (ii) Add a variable called ​win​ to the MPs data frame which is equal to 1 if the MP won the election and 0 otherwise. (Hint: use the fact that they win if the “margin” variable is positive. Further, once you figure out the proper code to create the 0 and 1s, you can create a new variable in the dataframe with the code: dataframe$win <- YourCode). Use the t.test function to do a difference of means test for whether winners of elections end up wealthier than losers. Interpret the output: do those who win elections die richer? With what level of confidence can we claim there is a difference? (iii) Now use the lm() function to do the same test as a regression, where ​ln.net​ is the dependent variable and ​win​ is the independent variable. Do you reach the same conclusions as in part ii? (Hint: as with the example in class, you won’t get the exact same standard errors/t statistic/p value as in part ii.) (iv) Suppose a researcher wants to use these results to claim that successfully winning a seat in parliament causes an increase in wealth. One alternative explanation is that in this data set, the Tories tended to win more elections [Optional: use the table() function to check that this is true!], and also as shown in part i tend to be richer. Run a multivariate regression with ​ln.net​ as the dependent variable and both ​win​ and ​tory ​as dependent variables. What does the output tell us about the plausibility of this alternative explanation? (v) Another alternative explanation is that those who perform better in elections may just be more talented and charismatic than losers, and being more talented and charismatic is really what causes wealth. Run a regression where ​ln.net​ is the dependent variable and ​margin​ is the independent variable. Interpret the output in light of this alternative explanation. (vi) Now run the line of code that creates a subset of the data for cases where the margin of victory was less than or equal to 5 percent (in either direction). Run a regression with winning as the IV and wealth as the DV on this sample, and compare your results to part iii. (vii) Why might the results of the regression you did in part vi reduce concerns about the alternative explanation raised in parts iv-v? (viii) Now run the line of code which runs this same regression but only for members of the Tory party using a “subset” argument within the lm() function. Write another line of code which does the same thing for members of the Labour party. Compare the results among these two groups. 4. Table 9.4 in Kellstedt and Whitten (pg 219) has the results of some regressions where the dependent variable is the average teacher salary in a U.S. state. (i) Interpret the slope coefficient on the percentage of state residents with a college degree in column A. (Hint: this column represents a bivariate model where this is the only independent variable). (ii) Create a 95% confidence interval for this slope coefficient. Can we conclude with 95% confidence that this relationship is positive? Can we conclude that more state residents with a college degree ​causes​ a change in teacher salaries? (iii) Interpret the slope coefficient in column B (on per capita income). (iv) Now interpret the slopes of the coefficients for the multivariate regression in column C, which includes both variables. Compare these to the corresponding slopes in columns A and B. What might explain what changes and what stays the same?

Solution PreviewSolution Preview

These solutions may offer step-by-step problem-solving explanations or good writing examples that include modern styles of formatting and construction of bibliographies out of text citations and references. Students may use these solutions for personal skill-building and practice. Unethical use is strictly forbidden.

Question 1
x and y1, The correlation for this relationships would be 0.46 because the graph shows a positive relationship and values follow a straight line but values are not close to the line.
x and y2, The correlation for this relationships would be -0.81 because the graph shows a negative relationship and values follow a straight line and most values are close to the line.
x and y3, The correlation for this relationships would be 0.01 because the graph shows a positive and negative both relationship and values do not follow a straight line but a parabola.
x and y4, The correlation for this relationships would be 0.87 because the graph shows a positive relationship and values follow a straight line but most values are close to the line.

Question 2
(i)
We can see that there is a positive relationship between county-level votes for Perot in 1996 and Buchanan in 2000. There seems to be a strong relationship between the number of votes cast for Ross Perot (a 3rd party candidate with roughly similar positions as Buchanan) in each county in 1996 and the number of votes for Buchanan in 2000 as most of the points seems to fit on a straight line except one point which is Palm Beach County is the point labeled as “PBC”. This point seems to be an outlier in the data but it is also depicts a positive relationship.
(ii)
The correlation coefficient considering all points is 0.7162634 which is a positive value and since this values line between 0.5 and 0.8, we can say that there is a moderator positive relationship between these two variables.
The correlation coefficient considering excluding PBC is 0.9225874, this is a positive and value more close to 1 and it means that there is a strong positive relationship between these two variables. This relationship excluding one single point or value of PBC, the correlation coefficient value increased significantly from 0.7162634 to 0.9225874, it means that Palm Beach County is an outlier in this data. This should be excluded from data in order to perform other analyses.
(iii)
The value of slope coefficient is 0.03592; it means that if there is a one unit increase in Perot vote, the Buchanan vote will increase by 0.03592 units....

By purchasing this solution you'll be able to access the following files:
Solution.docx.

50% discount

Hours
Minutes
Seconds
$163.00 $81.50
for this solution

or FREE if you
register a new account!

PayPal, G Pay, ApplePay, Amazon Pay, and all major credit cards accepted.

Find A Tutor

View available Statistics-R Programming Tutors

Get College Homework Help.

Are you sure you don't want to upload any files?

Fast tutor response requires as much info as possible.

Decision:
Upload a file
Continue without uploading

SUBMIT YOUR HOMEWORK
We couldn't find that subject.
Please select the best match from the list below.

We'll send you an email right away. If it's not in your inbox, check your spam folder.

  • 1
  • 2
  • 3
Live Chats