Question
1) Obtain groups of users seeking a loan from the dataset - that share similar characteristics. How many groups will you ideally derive?
2) Understand which group on the overall has the highest success rate (i.e not defaulting on payment). From each group try to get a % of loan defaults.
3) Apply this grouping algorithm on a test dataset from the year 2009 and explain your predictions with respect to the success rate (i.e. not defaulting on payment). As a bank what will you use these groupings for?
Solution Preview
These solutions may offer step-by-step problem-solving explanations or good writing examples that include modern styles of formatting and construction of bibliographies out of text citations and references. Students may use these solutions for personal skill-building and practice. Unethical use is strictly forbidden.
# Combing subset of rows from each year to make onedat <- rbind(dat0, dat1, dat2, dat3, dat4, dat5, dat6, dat7, dat8)
# Taking (prob_all) amount of samples from the combined data.
# Doing this it reduces the running time of the solutio,
# otherwise, it takes lot of time.
sample <- createDataPartition(dat$default, p = prob_all, list = FALSE)
dat <- dat[sample,]
# Keeping the original data in the dat_org variable
dat_org <- dat
# Removing unused attributes from the data
# as these may not have an impact on the clustering.
y <- dat$default
dat$default = NULL
year <- dat$year
# Removing year attributes also
dat$year <- NULL
# Printing summary of number of default/non-default cases
table(y)
prop.table(table(y))
# Checking number of missing rows, if any, if sum is 0, no missing row is present.
sum(is.na(dat))
# Scaling the numeric attributes, so that
# all are centered around 0 mean and have unit variance.
dat <- scale(dat)...
By purchasing this solution you'll be able to access the following files:
Solution.R.