 Statistical Analysis Of A Real Biological Dataset

Subject Mathematics Statistics-R Programming

Question

Description. In this project, you will be conducting a complete data analysis of a real biological dataset. To do so, we will be utilizing the methods and techniques we have learned in this course, particularly what we have done in the data analysis portions of the homeworks.
The Dataset. The dataset we will analyze is the Plasma Retinol dataset.

Questions of Interest. The ultimate goal of every data analysis is to answer questions that are useful in a practical sense. Here are the questions that you should address in this data analysis. For each question below, you should answer the question as specifically as you can.
Question 1. What are the true average plasma retinol and (log) plasma beta-carotene levels in the population?
Question 2. Is there a difference in plasma retinol level between males and females?
Question 3. Is there a relationship between grams of fat and grams of fiber consumed per day?
Question 4. Is there a relationship between smoking status and gender?
Question 5. Is vitamin use in the population the same across all categories? (Vitamin Use categories: Yes, fairly often; Yes, not often; and No)
Question 6. Does plasmal retinol level differ across different smoking statuses? (Smoking Status categories: Never, Former, and Current Smoker)

Outline. Your data analysis should be fully typed and should be in a report format. Your data analysis report should consist of the following sections:

1. Introduction. In the introduction section, you will briefly describe the dataset. You should describe the study from which the data originated from, state the number of variables and observations in the dataset, briefly introduce the questions of interest, and describe in words the population from which the data (possibly theoretically) was drawn from.
2. Exploratory data analysis. In this section, you will provide the appropriate numerical and graphical summaries for each question of interest. You should describe all plots and numerical summaries that you obtain (similar to the data analysis portions of the homeworks). See the ”short guide” found later in this handout for more details.
3. Checking assumptions. Before using formal procedures address the questions of interest, check if the assumptions to be able to use these methods hold for our data. Not all assumptions are checkable, and not all assumptions will necessarily hold.
Tip. If an assumption is not checkable, you should mention this fact, and then provide your best guess to whether or not the assumption holds, given the provided information. If an assumption clearly does not hold, then you should prepare to use an alternative method (ex. nonparametric test if the normality assumption does not hold). If no alternative method is available, then you should state that the analysis results may not be valid, and proceed with the current method.
4. Formal procedures. In this section, you will either construct a confidence interval or conduct the appropriate hypothesis test (or both, if possible), to address each question of interest. Again, see the ”short guide” for details.
5. Conclusion. Summarize your findings. You should state here your answer to each question of interest based on the data analysis you have performed. If the assumptions for a particular analysis were either uncheckable or not met, you should state so. Which of your findings were the most interesting?

A Short Guide of What to Use When
This guide will help you figure out what to do in order to perform the specific analyses (Steps 2-4) for each question of interest. For each question of interest, you should determine the corresponding type of problem it falls under from the list below.

One sample problem, quantitative variable
Graphical summaries: Histogram, boxplot
Numerical summaries: mean, standard deviation
Formal procedures: Confidence interval (with t); and/or one-sample t test [only if there is a claim to test]

One sample problem, categorical variable
Graphical summaries: Barplot
Numerical summaries: Frequency, relative frequency, percentage relative frequency (for each category)
Formal procedures: Confidence interval (with p˜ and Z) [only if there’s exactly two categories]; and/or chi-square goodness-of-fit test [only if there is a claim to test]

Two sample problem, quantitative variable
Graphical summaries: Side-by-side boxplots
Numerical summaries: Mean and standard deviation for each sample
Formal procedures: Confidence interval for the difference, and 2 sample t test [pick the right scenario]; or WMW test [if normality assumption is not met]

Three+ sample problem, quantitative variable
Graphical summaries: Side-by-side boxplots
Numerical summaries: Mean and standard deviation for each sample
Formal procedures: ANOVA test

Two categorical variables
Graphical summaries: Not covered in this course (don’t need to do it)
Numerical summaries: 2-Way Frequency table
Formal procedures: Chi square test for independence

Two quantitative variables
Graphical summaries: Scatterplot (with least squares line)
Numerical summaries: Coefficients of correlation and determination, r and R2
Formal procedure: Test the slope β1

Note. We did not cover how to check assumptions for regression problems (two quantitative variables). Therefore, you do not need to worry about checking assumptions for any question of interest that involves using regression techniques.

Solution Preview

This material may consist of step-by-step explanations on how to solve a problem or examples of proper writing, including the use of citations, references, bibliographies, and formatting. This material is made available for the sole purpose of studying and learning - misuse is strictly forbidden. This is only a preview of the solution. Please use the purchase button to see the entire solution

Related Homework Solutions

Statistics Questions \$68.00
Statistics
Mathematics
Hypothetical Experiments
Factors
Tables
Model
Regression
Slope
NAP Range
Variance
Species
Richness
Predictors
Coefficient
Determination
Probability Distribution
ANOVA
Functions
Data Analysis and Manipulation \$35.00
Statistics
Data Analysis
Data Sets
Linear Regression
ANOVA
Normality Test
Null Hypothesis
Alternative Hypothesis
Research
Functions
Tables
Regression Problems \$63.00
Mathematics
Statistics
Regression
Temperatures
Data Sets
R Programming
Functions
Plots
Alternative Hypothesis
Null Hypothesis
Z-Test Statistics
P-Values
Model Fit
Space Shuttle Challenger
Probability
R Programming Problems \$75.00
Statistics
R Programming
Mathematics
Codes
Functions
Data Sets
US
Graphics
Density
Predictors
Reports
Columns
Counts
Tables
Applied Statistics Questions \$40.00
Applied Statistics
Health
Pollution Levels
Pulmonary Function Tests
Cigarettes
Patients
AUC
Predictive Model
Estimation
Cross Validation
Mathematics
R Programming
Hypothesis Tests on Data Frames \$48.00
Mathematics
Statistics
Data
Libraries
Columns
Distribution
Histogram
R-Programming
Zyxin Gene
Expressions
Cancer Classification
Live Chats