QuestionQuestion

In the dataset you will find the data pertaining to mortgage default information for lenders by year. There is year wise data for lenders and for mortgages. You are hired as an analyst in the Bank to predict mortgage success rates, and, mortgage failures from existing data. Your next responsibility is to do the following

1) Obtain groups of users seeking a loan from the dataset - that share similar characteristics. How many groups will you ideally derive?

2) Understand which group on the overall has the highest success rate (i.e not defaulting on payment). From each group try to get a % of loan defaults.

3) Apply this grouping algorithm on a test dataset from the year 2009 and explain your predictions with respect to the success rate (i.e. not defaulting on payment). As a bank what will you use these groupings for?

Solution PreviewSolution Preview

These solutions may offer step-by-step problem-solving explanations or good writing examples that include modern styles of formatting and construction of bibliographies out of text citations and references. Students may use these solutions for personal skill-building and practice. Unethical use is strictly forbidden.

# Combing subset of rows from each year to make one
dat <- rbind(dat0, dat1, dat2, dat3, dat4, dat5, dat6, dat7, dat8)

# Taking (prob_all) amount of samples from the combined data.
# Doing this it reduces the running time of the solutio,
# otherwise, it takes lot of time.
sample <- createDataPartition(dat$default, p = prob_all, list = FALSE)
dat <- dat[sample,]

# Keeping the original data in the dat_org variable
dat_org <- dat

# Removing unused attributes from the data
# as these may not have an impact on the clustering.
y <- dat$default
dat$default = NULL
year <- dat$year
# Removing year attributes also
dat$year <- NULL

# Printing summary of number of default/non-default cases
table(y)
prop.table(table(y))

# Checking number of missing rows, if any, if sum is 0, no missing row is present.
sum(is.na(dat))

# Scaling the numeric attributes, so that
# all are centered around 0 mean and have unit variance.
dat <- scale(dat)...

By purchasing this solution you'll be able to access the following files:
Solution.R.

$100.00
for this solution

or FREE if you
register a new account!

PayPal, G Pay, ApplePay, Amazon Pay, and all major credit cards accepted.

Find A Tutor

View available Statistics-R Programming Tutors

Get College Homework Help.

Are you sure you don't want to upload any files?

Fast tutor response requires as much info as possible.

Decision:
Upload a file
Continue without uploading

SUBMIT YOUR HOMEWORK
We couldn't find that subject.
Please select the best match from the list below.

We'll send you an email right away. If it's not in your inbox, check your spam folder.

  • 1
  • 2
  • 3
Live Chats