In 1846, a group of 87 people (called the Donner Party) were headed west from Springﬁeld, Illinois, to California. The leaders attempted a new route through the Sierra Nevada and were stranded throughout the winter. The harsh weather conditions and lack of food resulted in the death of many people within the group. Social scientists have used the data to study the theory that females are better able than males to survive harsh conditions. The data are saved under Donner.txt.
(a) Create a logistic regression model using gender and age as predictors and provide the equation
of the estimated model.
(b) Interpret the regression coeﬃcients.
(c) Estimate the survival probability of a 20-year old female (show your calculation).
(d) Explain why the deviance or Pearson goodness-of-ﬁt tests are not appropriate.
(e) Assess the model goodness-of-ﬁt.
# Reading data
dat <- read.csv("StudentData.csv")
# Selecting numeric columns
df <- dat[, c('HSClass', 'TxtSent', 'TxtRec', 'Fbtime', 'Introvert')]
# Calculating mean of each variable
mean_all <- apply(df, 2, mean)
# Calculating covariance matrix
cov_mat <- cov(df)
# Plot box plot of the data
boxplot(df, main = "Box plot of each variable of the data")
# HSClass has the highest mean value while Introvert is the least
# HSClass has highly positive covriance with TxtSent and TxtRec
# Yes, we can see in the boxplot, there are two outliers one in HSClass at around 1200 and another
# one in TxtRec at 200.
# Plotting histogram to see outlier at the far right skewed tail.
hist(df$HSClass, xlab = "HSClass", main = "Distribution of HSClass")
hist(df$TxtRec, xlab = "TxtRec", main = "Distribution of TxtRec")...
By purchasing this solution you'll be able to access the following files: