Question
This exercise looks at categorical (qualitative) data where we have counts or frequencies of observations associated with levels of one categorical variable in the case of One-Way tables and associated with levels of two categorical variables in the case of Two-Way tables.
We use the Chi-squared distribution to assess the goodness of fit and to test for statistical independence
Using R software perform an analysis using a sample of size 100 as specified in the problems below (1,2,3,and 4) and must do an analysis using the entire dataset
Refer to the attached data(LDS_C10_RESPDIS) in Excel on smoking, alcohol consumption, blood pressure, and respiratory disease among1200 adults.
Please note columns A through E has nominal data of (0, 1) but column C, Drink, has ordinal data of (0, 1, 2).
The variables are as follows:
Sex (A): 1 =male , 0 = Female
Smoking status (B) : 0 = Nonsmoker, 1= Smoker
Drinking level (C): 0= nondrinker, 1= Light to moderate drinker, 2= heavy drinker
Symptoms of respiratory disease (D) : 1 = present; 0 = absent
High blood pressure status (E) : 1 = present; 0 = absent
Problems
1) Select a simple random sample of size 100 from this population and carry out an analysis to see if you can conclude that there is a relationship between smoking status and symptoms of respiratory disease. Let α = 0.05 and determine the p value for your test.
2) Select a simple random sample of size 100 from the population and carry out a
test to see if you can conclude that there is a relationship between drinking status and high blood
pressure status in the population. Let α = 0.05 and determine the p value.
3) Select a simple random sample of size 100 from the population and carry out a test to see if you can conclude that there is a relationship between gender and smoking status in the population. Let α = 0.05 and determine the p value.
4) Select a simple random sample of size 100 from the population and carry out a test to see if you can conclude that there is a relationship between gender and drinking level in the population. Let α = 0.05 and find the p value.
Solution Preview
These solutions may offer step-by-step problem-solving explanations or good writing examples that include modern styles of formatting and construction of bibliographies out of text citations and references. Students may use these solutions for personal skill-building and practice. Unethical use is strictly forbidden.
Problem 1H0: Variable Smoking and Respiratory are independent.
Ha: Variable Smoking and Respiratory are not independent.
fit.sampled <- chisq.test(dat.sampled$B, dat.sampled$D)
fit.sampled
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: dat.sampled$B and dat.sampled$D
## X-squared = 19.556, df = 1, p-value = 9.772e-06
cat("p-value: ", fit.sampled$p.value, "\n")
## p-value: 9.771616e-06
fit.pop <- chisq.test(dat$B, dat$D)
fit.pop
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: dat$B and dat$D
## X-squared = 290.19, df = 1, p-value < 2.2e-16
cat("p-value: ", fit.pop$p.value, "\n")
## p-value: 4.529107e-65...
By purchasing this solution you'll be able to access the following files:
Solution.docx and Solution.Rmd.