Homework #6: Significance Tests (small & large n; means & proportions)
â€¢ Be as complete and explicit as possible in your answers. On this and all assignments, show all work done, including any calculations, and write all answers in prose (sentence/paragraph) form; single-word or phrase answers are not sufficient, and points will be deducted for incomplete answers.
â€¢ Note: You need not elaborate all of the steps of the hypothesis test for these first four problems (except where asked), but you must do so for the problems from the book.
â€¢ Hint for all problems: Draw a picture first!! . Part I
â€¢ From the Agresti & Finlay problems for chapter 6 (pp. 175-182), do problems 2, 8, 14, & 52. Part II
â€¢ Do the following four problems:
1. State the null hypothesis Ho and the alternative hypothesis Ha that would be used for a hypothesis test related to each of the following statements. Clearly specify whether they are one- or two-tailed tests:
a. The mean age of students enrolled in evening classes at Piedmont Community College is greater than 26 years.
b. The mean distance from a hotel room in Amsterdam to the nearest museum is less than 200 yards.
c. The mean life of a carburetor is 10,000 miles.
d. The mean weight of college football players is no more than 210 lbs.
e. One in ten CSUN students has been arrested for public drunkenness
2. For a large-sample test of Ho: Mu = 0 against Ha: Mu is not equal to 0, the z statistic equals 1.07.
a. Find the p-value, and interpret it.
b. Suppose z = -2.51 rather that 1.07 â€“ Find the p-value and interpret it.
c. Does the larger z score mean stronger or weaker evidence in re the null hypothesis? Explain.
3. Find and interpret the p-value for testing Ho: Mu = 10 against Ha: Mu is not equal to 10, if...
a. ... a sample of 40 observations has a standard deviation of 4 and a mean of 10.3
b. ... a sample of 40 observations has a standard deviation of 4 and a mean of 9.7
c. ... a sample of 160 observations has a standard deviation of 4 and a mean of 9.7
d. ... a sample of 160 observations has a standard deviation of 2 and a mean of 9.7
4. Out of a random sample of 500 individuals who are planning to vote in a student council election, 260 plan on voting for Jefferson and 240 plan on voting for Smith.
a. Construct a 95% confidence interval for the proportion of votes that Jefferson will receive, and make a conclusion about whether Jefferson will win the election.
b. Calculate a z-score and interpret the probability (using 0.05 as the alpha level) in order to make a conclusion about whether Jefferson will win.
c. Compare your results from a and b â€“ do you think Jefferson will win the election?

Homework #7: Teasing Relationships from Univariate Data
â€¢ Guidelines: Hand in your output file. Write answers in sentence form, and be explicit and complete in explaining your answers. Show all work done, including any calculations, and explain all of the steps involved. Be certain to report all statistics in terms of some unit of analysis â€“ for example, years old, or dollars earned â€“ and be clear to distinguish between original units and standard errors.
â€¢ Work Alone: You may plan your work in study groups. You may even work together to perform the entire analysis on a different data set. But once you begin analyzing data, you are on your own and may only consult your book, your notes, and the website! You may of course ask me anything you wish, although some questions I canâ€™t answer.
â€¢ Study Topic: Based on your recently acquired statistical skills, youâ€™ve been hired as a consultant by a local agency. You have been asked to assess whether intergenerational transmission of educational attainment is the same for men and women â€“ that is, whether women follow in their mothersâ€™ footsteps as much as men follow in their fathersâ€™ footsteps. You will need to conduct the following analyses, using data from the 1984 General Social Survey (GSS84.SAV):
1) Find the data: Identify variables that measure the education level of the respondents, their fathers, and their mothers. You do not need to discuss univariate analysis of these variables for this assignment. But you should always look at frequency distributions (and histograms, if appropriate) before conducting analysis.
2) Clean the data: Create two new variables which measure the respondentsâ€™ improvement over their parents in terms of education â€“ that is, how much more education they received than their mothers and their fathers. (Hint: To compute how different you and I are in age, you might use AGEDIFF = MYAGE â€“ YOURAGE.)
3) Compute confidence intervals: Using SPSS output, construct confidence intervals for the mean improvement for each of four groups. (The first two are just your two new variables, for the full sample; the others are each variable, for one subsample. You could get the 3rd and 4th using SELECT IF twice, or at the same time using the descriptives table from ANOVA or t-test output.) Be sure to interpret the results!
âˆ— all respondents over their fathers
âˆ— all respondents over their mothers
âˆ— male respondents over their fathers
âˆ— female respondents over their mothers
4) Perform four tests: (Complete all five steps for each test, include carefully specified hypotheses!)
a) An hypothesis test for whether the sample is balanced in terms of gender, or if there are significantly more than 50% men or women.
b) A difference of proportions test for the proportion of men who improved over their fathers versus the proportion of women who improved over their mothers. (Hint: Use the cumulative frequencies column.)
c) An hypothesis test for whether women improve over their mothers more than the sample as a whole does
d) A difference of means test for whether men improve over their fathers as much as women improve over
their mothers
5) Make conclusions: State (in a few sentences, at least) a conclusion about sex differences in intergenerational change in education, taking account of any relevant factors (sample sizes, representativeness of sample proportions, confidence levels, satisfaction of assumptions made, etc.) that you have encountered during your study.

Optional Extra Credit Homework
Want more practice? Do problems 8, 16, 22, and 24 from chapter 7 (pp. 209-219 in the Agresti/Finlay text) for extra credit.

Homework #8: Correlation and Regression
Using data from the 1984 General Social Survey (gss84.sys), do the following:
1. Make an argument about the possible relationship between the education level of respondents and that of their fathers, using fatherâ€™s education as the independent variable. Using all criteria discussed, assess whether this could be a causal relationship.
2. Identify (by name and label) the two variables in the dataset which measure these concepts, and identify their operationalization and level of measurement.
3. Resolve all missing values for each variable, and indicate how youâ€™ve done so. (If there are more than three categories, youâ€™ll need recode all missing values to one, and use one value in the missing values command â€“ although, if there arenâ€™t any cases with a missing value, you donâ€™t have to worry about â€“ so check frequencies!)
4. Briefly analyze the univariate distributions of the two variables, including both center and spread, in both graphical and statistical terms (that is, tails and shapes as well as statistics and their interpretation) and compare them.
5. Make a scatter diagram using the plot command and discuss whether the plot implies the presence of a relationship. If so, what type of relationship appears â€“ positive or negative? strong, moderate, or weak? Linear or something else? See any outliers?
6. Run a regression with respondentsâ€™ education as the dependent variable and fatherâ€™s education as the independent variable. Then answer the following questions regarding the relationship between these two variables. (Yes, you also need to answer the questions asked above. Thatâ€™s why I asked them.)
a) What is the slope? What is the y-intercept? Provide an interpretation of these two statistics.
b) State the regressed prediction line in the form of an equation Y^ = a + bx
c) Calculate predicted values of the dependent variable for four values (0, 8, 12, and 16) of the independent variable. Mark and label these (x,y) points on the scatterplot, and sketch the prediction line.
d) For this data, what is the average education predicted for respondents whose fathers had no years of education? How much is each year of fatherâ€™s education â€œworthâ€ on average in terms of respondentsâ€™ years of education?
e) If the null hypothesis is that B = 0 (i.e. that the regression line has a horizontal slope, and that there is a â€œflatâ€ relationship, such that value of the dependent variable do not vary with values of the independent variable; i.e. there is â€œnoâ€ relationship), test that null hypothesis at the .05 level. (Yes, do all the steps of a hypothesis test: Interpret all values, and make firm conclusions, about both the hypotheses and the prose relationship.)
f) What is the R-squared value? Interpret it.
7. Run a correlation procedure on the two variables and describe the correlation between the two variables. (Be sure to interpret the size, direction, and significance of the correlation, and to clearly dilineate these.) How is this statistic (the correlation coefficient) similar or different from the standardized regression coefficient computed for the same two variables?

Homework #9: Crosstabs and Three-ways GENERAL INSTRUCTIONS
ï‚· You may type your answers into your output file (preferred) or write them out by hand, but you must hand in your output. You may not work with each other. You may always, of course, ask me anything you wish.
ï‚· Be sure to show all work and to write out all answers in sentence form; points will be deducted for incomplete answers. Be as complete as possible in your answers. On this and all assignments, put all answers in prose (sentence/paragraph) form; single-word or phrase answers are not sufficient. Be explicit and complete in explaining your answers. Show all work done, including any calculations, & explain all of the steps involved and all of the parts of each calculation.

SPECIFIC INSTRUCTIONS
Using data from the 1988 General Social Survey (gss88a.sys), do the following:
ï‚· Examine, report, and interpret the univariate distributions of SEX, XMOVIE, PRAY, and PARTNERS.
ï‚· Deal with these missing values: 0, 8, & 9 for PRAY; 0, 8, & 9 for XMOVIE; -1, -2, 9, 98, & 99 for PARTNERS
ï‚· Recode PRAY into an ordinal measure that distinguishes between those who pray one or more per day, from those who pray once or more per week, and those who pray less than once per week (or never).
ï‚· Recode PARTNERS into three categories, of those who have had no partners, those who have had one partner, and those who have had more than one partner.
ï‚· Crosstabulate sample data to examine whether someone having seen an x-rated movie is related to their sex.
1. What are the percentages of men and women who report that they have seen an x- rated movie?
2. Using percentage comparisons (modal frequencies, reference groups, or comparisons to the marginals), make an assessment as to whether or not there is a dependent relationship between these two variables. (â€œUsing percentage comparisonsâ€ means cite data!)
3. What would be the predicted percentage of women who have seen an x-rated movie under the (null) hypothesis of independence? Whatâ€™s the predicted number (or cell count)? (Show all calculations!)
4. What does the chi-square statistic tell us about this relationship? How does this relate to question 2? (Be sure to make an explicit conclusion about the null hypothesis.)
5. What does the tau (btau) statistic tell us about the strength of the relationship?
ï‚· Construct a crosstabulation of sample data to examine whether the number of partners someone has had is related to how often they pray.
6. Using percentage comparisons (modal frequencies, reference groups, or comparisons to the marginals), make an assessment as to whether or not there is a dependent relationship between these two variables. (â€œUsing percentage comparisonsâ€ means cite data!)
7. What does the chi-square statistic tell us about this relationship? How does this relate to question 6? (Be sure to make an explicit conclusion about the null hypothesis.)
8. Which cells are concordant to each â€œstart cellâ€, and why? (Reminder: you must start with each cell, and move through the table; any cell might be a â€œstart cellâ€.) How many total pairs of cases are concordant?
9. State the value of gamma, explain how it is calculated (but do not calculate it), and interpret it. (Remember to provide all interpretations of gamma â€“ strength, direction, and PRE.)

EXTRA CREDIT (Points on this assignment, not a separate â€œextra creditâ€ homework assignment) ï‚· Construct a three-way crosstabulation of sample data to examine whether the relationship between praying and number of partners varies by sex.
10. Among men, is prayer significantly related to number of partners? In which direction, and how strongly? To which statistics (plural!) do you refer? Explain why they apply.
11. Among women, is prayer significantly related to number of partners? In which direction, and how strongly? To which statistics (plural!) do you refer? Explain why they apply.
12. Which cells in either table are â€œconcordantâ€ relative to the top-left cell (i.e. row 1, column 1)? How many females are in those cells, and how many concordant pairs (with that single start cell) are there for females? How many males are in those cells, and how many concordant pairs (with that single start cell) are there for males?
13. State (do not calculate) the value of gamma for each â€œpartialâ€ table (that is, the crosstab for each category of sex) and interpret it. (Remember to provide all interpretations of gamma â€“ strength, direction, and PRE.)For whom are praying and number of partners more strongly related?

Homework #6: Significance Tests (small &amp; large n; means &amp; p...

Homework #6: Significance Tests (small & large n; means & p...