 # Nearest Neighbor

## Transcribed Text

Nearest Neighbor For this problem, you will need MNIST optical character recognition (OCR) data included with the problem set in the file MNISTdata. mat. To load the data into the Matlab workspace, use the Matlab command load('MNISTdata.mat'), called within the folder where you saved the files. There are 2000 training samples and 1000 test samples, stored in four member variables called trainx, trainLabel, testx, and testLabel The training data consists of two arrays: a 2000 x 784 array trainx containing the data points, one point per row, and a one-dimensional array trainLabel (of length 2000) containing the labels (0-9). The test data is similar, but contains only 1000 points. In order to view one of the data points, you can use Matlab's image command, although you first need to reshape the format from a 784-dimensional vector into a 28 X 28 matrix. For instance, to see point number 75, you would use: image(reshape(trairex(75,1) [28 28])'), If you find the output somewhat garish, try prefacing the last command with: colormap( (1.0 gray); In this exercise, we try to understand the effect of training data on the performance of a nearest neighbor classifier. a) For each class, extract all the elements in the training set that belong to the class. Using this, write a Matlab function for computing the mean of every class. Call this set of means, Xm and their associated labels Ym. Please submit the Matlab script that you used to compute both xmand Ym b) Write a Matlab function for computing the label of a query point z using the labels of its k nearest neighbors: function [class, lds] = krm classify(x, C,z,k) Here x is the data used to train the nearest neighbor classifier (some n X d array of n points in Rd and C is a vector of corresponding training labels (of length n.). The values returned are lds, a vector of length k holding the indices (into X) of the k nearest neighbors; and class, the majority vote over those neighbors. Note, Matlab has a built in function for nearest neighbor search whose syntax is idx = knnsearch(x,Y) More information can be obtained by typing help knnsearch at the Matlab command prompt. c) In this sub-problem, we try to determine the impact of reducing the amount of data used by the nearest neighbor classifier. Determine the error rate of the k nearest-neighbor classifier (on the test set) as a function of k, for k € {1; 3: 5; 7; 9; 11}. Plot two separate error curves for the following cases: Using the entire training data as the input to the function knutclassify, Lex = trainx Using only M as the input to the function knn_classtfy, i.e. x = xm. The Matlab plot function is useful for this. d) In this sub-problem, we study the impact of the distance metric used to compute the nearest neighbor. Determine the error rate of the k nearest-neighbor classifier (on the test set) as a function of k, for k c (1,3,5,7,9, 11}. Plot separate error curves for the following distance metrics: Euclidean (or L2 distance) d(x,y) = Cityblock (or L1 distance) Correlation: d(x,y) = 1 - cov(x.y) (Here tov(x,y) is the covariance X and y respectively) axay Which distance metric performs best? Note The distance metric to be used can be specified as an option to the function knnsearch. Please use help knnsearch for more details. For the above problems, please submit the function knn_classify and the different error curves. For the figures, make sure the axes are labeled correctly and indicate what each curve corresponds to.

## Solution Preview

This material may consist of step-by-step explanations on how to solve a problem or examples of proper writing, including the use of citations, references, bibliographies, and formatting. This material is made available for the sole purpose of studying and learning - misuse is strictly forbidden.

a. Write a matlab function for computing the mean of every class.
%%%
%%% PART A
%%%

%Compute mean function
function [X_m,y_m] = compute_mean(X,y)

%Initialize labels
y_m = [0;1;2;3;4;5;6;7;8;9];
%Create empty array to be populated with mean values
X_m = zeros(10,784);

%Go through all values in the dataset and calculate totals for each label &
%coordinate
for i=1:size(X,1)
label = y(i);
for j=1:size(X,2)
X_m(label+1,j)=X_m(label+1,j)+X(i,j);
end
end

%Divide each total by the number of each type of label to calculate the
%mean
for i=1:size(X_m,1)
num = nnz(y==i-1);
for j=1:size(X_m,2)
X_m(i,j)=X_m(i,j)/num;...
\$40.00 for this solution

PayPal, G Pay, ApplePay, Amazon Pay, and all major credit cards accepted.

### Find A Tutor

View available MATLAB for Computer Science Tutors

Get College Homework Help.

Are you sure you don't want to upload any files?

Fast tutor response requires as much info as possible.

SUBMIT YOUR HOMEWORK
We couldn't find that subject.
Please select the best match from the list below.

We'll send you an email right away. If it's not in your inbox, check your spam folder.

• 1
• 2
• 3
Live Chats