Transcribed TextTranscribed Text

Part I: Aspirin and heart attack A study investigating the association between heart attacks and the use of aspirin is conducted. Age is a potential confounder and is also considered. The following indicator variables are defined: 1 if heart attack if aspirin Y = Aspirin if no heart attack 0 if placebo Agel Age2 otherwise 0 otherwise The following table shows the results of fitting logistic regression models for P(Y = 1): Model Covariates Estimate 8 Standard Error log-likelihood 1 None -2.99 0.19 -116.54 2 Aspirin -0.82 0.41 -114.41 3 Agel -0.19 0.47 -116.27 Age2 0.17 0.45 4 Aspirin -0.82 0.41 -114.14 Agel -0.18 0.47 Age2 0.19 0.45 5 Aspirin -0.65 0.63 -113.83 Agel -0.22 0.59 Age2 0.39 0.54 (Agel)*Aspirin 0.10 0.97 (Age2)*Aspirin -0.68 1.03 (a) Test the null hypothesis of constant aspirin effect on the risk of heart attack across age groups (i.e., no interaction between aspirin and age). (b) Based on the model with additive/main effects for age and aspirin (Model 4) i. Calculate the Maximum Likelihood Estimate of the oddsratio of aspirin use on heart attack. adjusting for age. Provides 95% confidence interval for this odds ratio and interpret it in context. ii. Perform Wald test of the null hypothesis that there is no effect of aspirin on the risk of heart attack, controlling for age What do you conclude? iii. Perform likelihood ratio test of the null hypothesis that there is no effect of age on the risk of heart attack, controlling for aspirin use. State your conclusion (c) Evaluate the deviance of each model provided in the table and assess its goodness-of fit. Which models do not provide adequate fit to the data? (d) Perform model selection using analysis-of-deviance. Make sure you describe all the steps to arrive at your final model. Part II: Credit risks for bank loan Banks want to reduce the rate of loan defaults Loan officers want to be able to identify characteristics that are indicative of people who are likely to default on loans, and then use those characteristics to identify good and bad credit risks. Financial and demographic information are collected on 850 past and prospective customers. Of these, 700 are customers who were previously given loans and 150 are prospective customers that the bank needs to classify as good of bad credit risks. The data are saved in Bank loan txt and contain the following variables: age age years ed highest level of education 1: did not complete high school; 2: high school degree 3: some college; 4: college degree: 5: post-bachelor degree employ years with current employer address years current address income household income in thousands debtine debt to income ratio (x100) creddebt credit card debt in thousands othdebt other debt in thousands default previously defaulted O: No. 1: Yes, NA: prospective customers 1. Exploratory data analysis & data processing (a) Provide appropriate summary statistics and graphical displays for the variables in the data. Discuss their distributions. (b) Since there are few observations with post-bachelor degree (ed 5), combine these with the group with college degree (ed 4). You will be using education with these levels in subsequent analyses. (c) Separate the 150 prospective customers for whom credit risk isto be predicted from the 700 past customers. 2 2. Model building & diagnostics use the 700 past customers for this task. (a) Perform stepwise selection. i. Provide the equation of the selected model. ii. Interpret the effect of each of the covariates in the selected model. iii. Assess the goodness-of- fit of the selected model. (b) Perform lasso variable selection using the misclassification error as criterion for choosing à i. Compare the models selected using lambda 1se to the stepwise selected model in (a). Perform likelihood ratio test to choose the preferred model between the two at a 0.05. (c) For the model selected based on lasso i. Identify observations with unusual/outlying standardized residuals. How do the predictions for these individuals, based on their fitted values its. compare to their observed default status? ii. Using cut-off of 0.3 for predicting whether person defaults or not on loan what proportion of the 700 customers would have been predicted as defaulting (and thus would have been denied loan)? what would be the misclassification rate? what would be the misclassification rate among the defaulters? what would be the misclassification rate among the non-defaulters? iii. Provide the ROC curve and the area under the ROC curve for the selected model 3. Prediction for future customers consider the model selected based on lasso, (a) Calculate the predicted probabilities of loan default for the 150 prospective customers (b) Provide histogram and boxplot of the predicted probabilities. (c) Using cut-off of 0.3. how many of the 150 prospective customers would be expected to default on loan?

Solution PreviewSolution Preview

These solutions may offer step-by-step problem-solving explanations or good writing examples that include modern styles of formatting and construction of bibliographies out of text citations and references. Students may use these solutions for personal skill-building and practice. Unethical use is strictly forbidden.

    By purchasing this solution you'll be able to access the following files:

    for this solution

    or FREE if you
    register a new account!

    PayPal, G Pay, ApplePay, Amazon Pay, and all major credit cards accepted.

    Find A Tutor

    View available Statistics-R Programming Tutors

    Get College Homework Help.

    Are you sure you don't want to upload any files?

    Fast tutor response requires as much info as possible.

    Upload a file
    Continue without uploading

    We couldn't find that subject.
    Please select the best match from the list below.

    We'll send you an email right away. If it's not in your inbox, check your spam folder.

    • 1
    • 2
    • 3
    Live Chats