QuestionQuestion

Around 2005, the credit card issuers in Taiwan faced the cash and credit card debt crisis and the delinquency was expected to peak in the third quarter of 2006. In order to increase market share, card-issuing banks in Taiwan over-issued cash and credit cards to unqualified applicants. At the same time, most cardholders, irrespective of their repayment ability, overused credit card for consumption and accumulated heavy credit and cash– card debts. The crisis caused the blow to consumer finance confidence and it is a big challenge for both banks and cardholders. (Yeh, I. C., & Lien, C. H. (2009). The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications, 36(2), 2473-2480).To manage the risk, banks need to predict the default payment by the card holders. In finance, default is when a person fails to meet the condition of a loan. In this case, default is if a card holder fail to pay the credit card bill in a month.

In this assignment you're given a dataset that includes 30,000 credit card holder information in Taiwan in October 2005. The target here is to predict the default payment by a credit card holder. The variables in this dataset are:
Variable Name and Description
LIMIT_BAL Amount of the given credit (NT dollar): it includes both the individual consumer credit and his/her family (supplementary) credit
SEX Gender (1 = male; 2 = female) (Categorical)
EDUCATION Education (1 = graduate school; 2 = university; 3 = high school; 4 = others) (Categorical)
MARRIAGE Marital status (1 = married; 2 = single; 3 = others) (Categorical)
AGE Age (year)
PAY_1 - PAY_6 History of past payment. We tracked the past monthly payment records (from April to September, 2005) as follows: PAY_1 = the repayment status in September, 2005; PAY_2 = the repayment status in August, 2005; . . .;PAY_6 = the repayment status in April, 2005. The measurement scale for the repayment status is: -2, -1 and 0 = pay duly; 1 = payment delay for one month; 2 = payment delay for two months; . . .; 8 = payment delay for eight months; 9 = payment delay for nine months and above.
BILL_ATM1 - BILL_ATM6 Amount of bill statement (NT dollar). BILL_ATM1 = amount of bill statement in September, 2005; BILL_ATM2 = amount of bill statement in August, 2005; . . .; BILL_ATM6 = amount of bill statement in April, 2005
PAY_ATM1 - PAY_ATM6 Amount of previous payment (NT dollar). PAY_ATM1 = amount paid in September, 2005; PAY_ATM2 = amount paid in August, 2005; . . .;PAY_ATM6 = amount paid in April, 2005
Default_Payment_Next_Month A binary variable that shows if there has been default payment for a card holder in October 2005 (1: Default, 0: Not default)

Build a logistic regression model to predict default payment in October 2005 (do not forget to convert categorical data into binary columns). Cover these points in your report:

1 - Describe and visualize the data (use descriptive statistics methods you learned so far)
2 - Create a baseline model.
3 - Is there any insignificant independent variable in your model? Which ones? Explain
4 - What does the coefficient of the independent variables in your model tells you? Explain
5 - Do you see any sign of multicollinearity? If yes, how do you avoid it?
6 - considering multicollinearity issue and the insignificance of some of the independent variables, try to create a better model. How do you assess if the new model is better?
7- Using the concepts of confusion matrix, sensitivity and specificity, what do threshold do you pick? Why?
8 - Draw a ROC curve and justify your choice of threshold.
9 - Apply your model to your test set and interpret the results.
10 - How much have you improved over the baseline model? Explain

Solution PreviewSolution Preview

These solutions may offer step-by-step problem-solving explanations or good writing examples that include modern styles of formatting and construction of bibliographies out of text citations and references. Students may use these solutions for personal skill-building and practice. Unethical use is strictly forbidden.

# To draw the ROC curve we will use "ROCR" package
library("ROCR")
library("pROC")
CrisisData<-read.csv("D:/WOrk/Descriptives 4//FMJ9orPvyHpYHroe5Ak2.csv")

# Let's have a glimpse at the structure of the data
str(CrisisData)
## 'data.frame':    30000 obs. of 24 variables:
## $ LIMIT_BAL                : int 20000 120000 90000 50000 50000 50000 500000 100000 140000 20000 ...
## $ SEX                      : int 2 2 2 2 1 1 1 2 2 1 ...
## $ EDUCATION                : int 2 2 2 2 2 1 1 2 3 3 ...
## $ MARRIAGE                  : int 1 2 2 1 1 2 2 2 1 2 ...
## $ AGE                      : int 24 26 34 37 57 37 29 23 28 35 ...
## $ PAY_1                     : int 2 -1 0 0 -1 0 0 0 0 -2 ...
## $ PAY_2                     : int 2 2 0 0 0 0 0 -1 0 -2 ...
## $ PAY_3                     : int -1 0 0 0 -1 0 0 -1 2 -2 ...
## $ PAY_4                     : int -1 0 0 0 0 0 0 0 0 -2 ...
## $ PAY_5                     : int -2 0 0 0 0 0 0 0 0 -1 ...
## $ PAY_6                     : int -2 2 0 0 0 0 0 -1 0 -1 ...
## $ BILL_AMT1                : int 3913 2682 29239 46990 8617 64400 367965 11876 11285 0 ...
## $ BILL_AMT2                : int 3102 1725 14027 48233 5670 57069 412023 380 14096 0 ...
## $ BILL_AMT3                : int 689 2682 13559 49291 35835 57608 445007 601 12108 0 ...
## $ BILL_AMT4                : int 0 3272 14331 28314 20940 19394 542653 221 12211 0 ...
## $ BILL_AMT5                : int 0 3455 14948 28959 19146 19619 483003 -159 11793 13007 ...
## $ BILL_AMT6                : int 0 3261 15549 29547 19131 20024 473944 567 3719 13912 ...
## $ PAY_AMT1                  : int 0 0 1518 2000 2000 2500 55000 380 3329 0 ...
## $ PAY_AMT2                  : int 689 1000 1500 2019 36681 1815 40000 601 0 0 ...
## $ PAY_AMT3                  : int 0 1000 1000 1200 10000 657 38000 0 432 0 ...
## $ PAY_AMT4                  : int 0 1000 1000 1100 9000 1000 20239 581 1000 13007 ...
## $ PAY_AMT5                  : int 0 0 1000 1069 689 1000 13750 1687 1000 1122 ...
## $ PAY_AMT6                  : int 0 2000 5000 1000 679 800 13770 1542 1000 0 ...
## $ Default_Payment_Next_Month: int 1 1 0 0 0 0 0 0 0 0 ......

By purchasing this solution you'll be able to access the following files:
Solution.docx and Solution.R.

$35.00
for this solution

or FREE if you
register a new account!

PayPal, G Pay, ApplePay, Amazon Pay, and all major credit cards accepted.

Find A Tutor

View available Statistics-R Programming Tutors

Get College Homework Help.

Are you sure you don't want to upload any files?

Fast tutor response requires as much info as possible.

Decision:
Upload a file
Continue without uploading

SUBMIT YOUR HOMEWORK
We couldn't find that subject.
Please select the best match from the list below.

We'll send you an email right away. If it's not in your inbox, check your spam folder.

  • 1
  • 2
  • 3
Live Chats