QuestionQuestion

Using the hotel reviews dataset from, run sentiment analysis on the data. You may use an approach of your choice but it must be your own code (i.e. submitting data to a cloud API is not acceptable). You could use opinion word lists of Bing Liu (provided). If appropriate, you should consider case insensitive matching and stemming. Try to detect simple negation (e.g. “not” followed by sentiment word) to correct for negations.
Your software should output the id of the review and a score. In the naïve word-counting approach, this would be the sum of positive words – the sum of negative words.
Using Excel, create a scatter plot comparing the given user rating and your calculated sentiment score for a random sample of 250 reviews. Is there a correlation?

Solution PreviewSolution Preview

These solutions may offer step-by-step problem-solving explanations or good writing examples that include modern styles of formatting and construction of bibliographies out of text citations and references. Students may use these solutions for personal skill-building and practice. Unethical use is strictly forbidden.

# Importing pandas library
import pandas as pd
# Importing collections to use Counter function
from collections import Counter
# Regular Expression library
import re

# This pattern is used to find punctuation from the reviews.
PUNCTUATION_PATTERN = re.compile("(\.)|(\;)|(\:)|(\!)|(\')|(\?)|(\,)|(\")|(\()|(\))|(\[)|(\])")

# This pattern is used to find spaces from the reviews.
WITH_SPACE_PATTERN = re.compile("(<br\s*/><br\s*/>)|(\-)|(\/)")

# Opening positive words file.
f = open("positive-words.txt")
# Reading lines which are not empty and ignoring commented lines (lines staring from ; are commented lines)
pos_words = [line.rstrip() for line in f.readlines() if not line.startswith(";") and line.strip() != ""]
f.close()   # close the file
# Printing all positive words
print(pos_words)

# Opening negative words file.
f = open("negative-words.txt")
# Reading lines which are not empty and ignoring commented lines (lines staring from ; are commented lines)
neg_words = [line.rstrip() for line in f.readlines() if not line.startswith(";") and line.strip() != ""]
f.close()   # closing file.
# Printing all negative words
print(neg_words)


# Reading reviews data from the csv file using pandas read_csv function.
dat = pd.read_csv("Datafiniti_Hotel_Reviews.csv")

# Picking review column and converting it to list
reviews_data = dat["reviews.text"].tolist()

# Converting each review to string type
reviews_data = [str(line) for line in reviews_data]

# Stripping white spaces from each left-right side of the text.
reviews_data = [line.strip() for line in reviews_data]

# Removing punctuation from each review.
reviews_data = [PUNCTUATION_PATTERN.sub("", line.lower()) for line in reviews_data]

# Removing white spaces from within each review.
reviews_data = [WITH_SPACE_PATTERN.sub(" ", line) for line in reviews_data]

# Score list holding score of each review
score...

By purchasing this solution you'll be able to access the following files:
Solution.xlsx, Solution1.xlsx, Solution2.xlsx, Solution3.xlsx and Solution.zip.

50% discount

Hours
Minutes
Seconds
$40.00 $20.00
for this solution

or FREE if you
register a new account!

PayPal, G Pay, ApplePay, Amazon Pay, and all major credit cards accepted.

Find A Tutor

View available Python Programming Tutors

Get College Homework Help.

Are you sure you don't want to upload any files?

Fast tutor response requires as much info as possible.

Decision:
Upload a file
Continue without uploading

SUBMIT YOUR HOMEWORK
We couldn't find that subject.
Please select the best match from the list below.

We'll send you an email right away. If it's not in your inbox, check your spam folder.

  • 1
  • 2
  • 3
Live Chats