Answer the following questions. Copy and paste any required data charts or summaries into this Word document. Use additional space as needed. Be sure to include your name on the document and use the file naming convention.This exam is open book and open notes.

I. Study Design and Sample Size
You are asked to design a study to assess the risk of water contamination in water supply and development of GI cancer.The national GI cancer rate is reported at 1 per 100,000.
1. What kind of study design would you use? Why?
2. What type of the statistical testing would you use? Why?
3. Based on your study design, calculate the minimum sample size with alpha 0.05, power 0.80, and an assumed effect size of .30.
4. Interpret the result of the sample size calculation.

II. Two-way ANOVA
A researcher is comparing the effects of 2 different asthma drugs on test performance. The researcher suspects that at least one of the drugs may have a different effect on fresh versus tired test takers. Data was collected on test results achieved by fresh and tired test takers after using each of the 2 drugs. There is concern that the exposure to these drugs could impair more than a personâ€™s ability to take a test and therefore become a Public Health issue. The Research Question is: Does Drug A or Drug B impair test performance in either fresh or tired test takers? Please answer this question using the Final Exam â€“ ANOVA (SPSS document) dataset and the following steps:
1. Provide numeric descriptive statistics (include skewness and kurtosis if appropriate) and graphic descriptions for Alertness, Drug Treatment, and Test Performance.
2. Create histograms of the test performance results (dependent variable) for each combination of levels for the two independent variables. Describe the data and shape of the distributions.
3. Discuss whether the assumptions of homogeneity of variance of the groups and normality of the data on test performance are met. Be sure to include output to support your decision on whether the assumptions have been met. (Continue with the analyses even if assumptions are not met.)
4. Conduct two-way ANOVA with interaction and post hoc analysis (as appropriate) using Tukey to correct for multiple comparisons. Provide relevant SPSS output.
5. Interpret the analysis results in the context of the research question. Include important statistics from your analysis results to support your conclusion and generalize your results, if appropriate, to the relevant population(s).

III. Multiple Linear Regression
A health department randomly selected 400 subjects from a local community and monitored their cardiovascular condition. Data from this study are provided in the Final Exam â€“ Linear and Logistic (SPSS document) dataset.The following variables are included in the database: sex, age, BMI, SBP, DBP, serum cholesterol, coronary heart disease, and follow-up.
1. Conduct a multiple linear regression using SPSS. Provide relevant SPSS output and assess the statistical significance of the effect of sex, age, and BMI on systolic blood pressure.
2. Explain the assumptions of Linearity, Sampling independence, Normality, and Homoscedasticity (or equal variance). How would you test whether these have been met? (Note: for the exam you do not need to test these assumptions)
3. Explain the practical implications of your finding. Include a reference to the R square of the model in your discussion.
4. Discuss whether or not there is interaction (effect modification) between sex and age.

IV. Multiple Logistic Regression
Use the Final Exam â€“ Linear and Logistic (SPSS document) dataset to assess the impact of sex, age, and BMI on the risk of coronary heart disease.
1. Conduct simple logistic regression of coronary heart disease and sex.
2. Conduct a multiple logistic regression using SPSS to address the research question: What is the association between sex and coronary heart disease after controlling for age and BMI?
3. Discuss how the addition of age and BMI in the model affected the association of sex and coronary heart disease using the Odds Ratios and confidence intervals in your output.
4. Assess the statistical significance of the individual risk factors and explain the practical implication of your finding.

V. Cox Proportional Hazard
The Final Exam â€“ Linear and Logistic (SPSS document) dataset, used in problems III and IV, also includes follow-up time (in days) from the beginning of the study to either onset of coronary heart disease or end of the study. This allows you to also look at the relationship of sex to CHD using survival analysis techniques.
1. Complete a Kaplan-Meier Survival Analysis using Followup as the Time variable, Chdfate as the status variable and Sex as the factor. Produce a plot of the survival function. Discuss whether the survival time appears related to whether the person is male or female based on the survival plot.
2. Use Kaplan-Meier in SPSS to test the assumption of proportionality. Create a Hazard plot with time = followed, status=Chdfate, and factor = sex. Interpret the results.
3. Conduct a Cox Proportional Hazard regression to compare the time to coronary heart disease event between men and women. Include a Plot of the Hazards function stratified by sex in the output. Interpret the results.
4. How does the hazard ratio compare to the odds ratio obtained from the simple logistic regression from the previous problem? Why might they differ?

Answer the following questions. Copy and paste any required data ch...