QuestionQuestion

1. Given two objects represented by the tuples (22, 1, 42, 10) and (20, 0, 36, 8):
(a) Compute the Euclidean distance between the two objects.
(b) Compute the Manhattan distance between the two objects.
(d) Compute the Supremum distance between the two objects.

2. For the following asymmetric binary attributes, calculate the Jaccard coefficient (similarity) between following two objects:
X = (1, 0, 0, 0, 0, 0, 0, 0, 0, 0)
Y= (0, 0, 0, 0, 0, 0, 1, 0, 0, 1)

3. Calculate cosine similarity for the following two document vectors:
X = (3, 2, 0, 5, 0, 0, 0, 2, 0, 0)
Y = (1, 0, 0, 0, 0, 0, 0, 1, 0, 2)

4. Principal Component Analysis (PCA): Use the Wine_Quality_Training_File data set available on Canvas, for this exercise. The data consist of chemical data about some wines from Portugal. The target variable is quality. Remember to omit the target variable from the dimension-reduction analysis. Use only the red wines for the analysis (Hint: Subset your data and save it in a separate dataframe).
(a) Perform some initial exploratory analysis on your dataset.
(b) Standardize the predictors.
(c) Provide a matrix showing the correlation coefficients of each predictor with each other predictor. Use some type of visualization technique to display the correlations so that the reader can easily see at a glance, which are the strongest correlations. Discuss which sets of predictors seem to “vary together”.
(d) Run PCA on the standardized variables.
(e) Determine the optimal number of components to extract, using:
• The Eigenvalue Criterion,
• The Proportion of Variance Explained Criterion
• The Scree Plot Criterion

Solution PreviewSolution Preview

These solutions may offer step-by-step problem-solving explanations or good writing examples that include modern styles of formatting and construction of bibliographies out of text citations and references. Students may use these solutions for personal skill-building and practice. Unethical use is strictly forbidden.

# 1
x <- c(22, 1, 42, 10)
y <- c(20, 0, 36, 8)
# (a)
dist(rbind(x, y), method="euclidean")
# 6.708204

# (b)
dist(rbind(x, y), method="manhattan")
# 11

# (c)
dist(rbind(x, y), method="maximum")
# 6

# 2
X <- c(1, 0, 0, 0, 0, 0, 0, 0, 0, 0)
Y <- c(0, 0, 0, 0, 0, 0, 1, 0, 0, 1)
table(X, Y)
#    Y
# X   0 1
#   0 7 2
#   1 1 0

0 / (7 + 2 + 1)
# 0
library(philentropy)
1 - distance(rbind(X, Y), method = "jaccard")...

By purchasing this solution you'll be able to access the following files:
Solution.R.

$21.00
for this solution

PayPal, G Pay, ApplePay, Amazon Pay, and all major credit cards accepted.

Find A Tutor

View available Statistics-R Programming Tutors

Get College Homework Help.

Are you sure you don't want to upload any files?

Fast tutor response requires as much info as possible.

Decision:
Upload a file
Continue without uploading

SUBMIT YOUR HOMEWORK
We couldn't find that subject.
Please select the best match from the list below.

We'll send you an email right away. If it's not in your inbox, check your spam folder.

  • 1
  • 2
  • 3
Live Chats