Homework # 4

Write a program in Python that implements a Multinomial Naive Bayes classifier for sentiment analysis.

You need to implement four functions as defined below:

1. A function build raw data: # There are 2000 reviews with Positive and Negative reviews.
You also need to remove stop words and lemmatize tokens.

from nltk.corpus import stopwords
set (stopwords.words('english'))

2. A function feature_selection: # This function will construct a set of features (words) for document representation.
You need to identify a list of features (1-grams or 2-grams).
Ideally you want about 1000 features, but you need to experiment based on your classifier accuracy.

3. A function text_to_vector: # This function will convert each review to a vector of frequencies.
Each vector has the same dimension as the list of your features.

4. A function split_data: # this function randomly splits your data into 75% and 25% for training and testing sets, respectively.

5. A function model: # This function will build the classifier and report accuracy.

## Solution Preview

import nltk
from nltk.corpus import movie_reviews
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
import string

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score

stop_words = set(stopwords.words('english'))

def build_r...

