QuestionQuestion

Overview and Assignment Goals:
The objectives of this assignment are the following:
• Implement the Nearest Neighbor Classification Algorithm
• Handle Text Data (Reviews of Amazon Baby Products)
o Design and Engineer Features from Text Data.
• Choose the Best Model, i.e., Parameters of a Nearest Neighbor Selection, Features and Similarity Functions

Detailed Description:
A practical application in e-commerce applications is to infer sentiment (or polarity) from free-form review text submitted for a range of products.

For the purposes of this assignment you have to implement a k-Nearest Neighbor Classifier to predict the sentiment for 18506 reviews for baby products provided in the test file (test.data). Positive sentiment is represented by a review rating and given by +1 and Negative Sentiment is represented by a review rating of -1. In test.dat you are only provided the reviews but no ground truth rating which will be used for comparing your predictions.

Training data consists of 18506 reviews as well and exists in file train_file.data. Each row begins with the sentiment score followed with a text of the rating.

For Evaluation Purposes (Leaderboard Ranking) we will use the Accuracy Metric comparing the Predictions submitted by you on the test set with the ground truth. Some things to note:
• The public leaderboard shows results for 50% of randomly chosen test instances only. This is a standard practice in data mining challenge to avoid gaming of the system. The private leaderboard will be released after the deadline evaluates all the entries in the test set.
• In a 24-hour cycle you are allowed to submit a prediction file 5 times only.
• The final ranking will always be based on the last submission.
• format.dat shows an example file containing 18506 rows alternating with +1 and
-1. Your test.dat should be similar to format.dat with same number of rows i.e., 18506 but of course the sentiment score generated by your developed model.

Rules:
• This is an individual assignment. Discussions of broad level strategies are allowed but any copying of prediction files and source codes will result in honor code violation.
• Feel free to use the programming language of your choice for this assignment.
• While you can use libraries and templates for dealing with text data you should implement your own nearest neighbor classifier.

Solution PreviewSolution Preview

These solutions may offer step-by-step problem-solving explanations or good writing examples that include modern styles of formatting and construction of bibliographies out of text citations and references. Students may use these solutions for personal skill-building and practice. Unethical use is strictly forbidden.

import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer, TfidfTransformer
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
import re

training_file = "train_file.data"
test_file = "test_file.data"

train_data = pd.read_table(training_file, header=None)
train_data.columns = ["y", "x"]
train_data.dropna()
train_data.reset_index(drop=True, inplace=True)

def relabel(x):
    print(x)
    if x < 0:
       return 0
    return x...

By purchasing this solution you'll be able to access the following files:
Solution.py and SolutionTestResults.data.

$65.00
for this solution

PayPal, G Pay, ApplePay, Amazon Pay, and all major credit cards accepted.

Find A Tutor

View available Python Programming Tutors

Get College Homework Help.

Are you sure you don't want to upload any files?

Fast tutor response requires as much info as possible.

Decision:
Upload a file
Continue without uploading

SUBMIT YOUR HOMEWORK
We couldn't find that subject.
Please select the best match from the list below.

We'll send you an email right away. If it's not in your inbox, check your spam folder.

  • 1
  • 2
  • 3
Live Chats