Deliverables : You only need to submit your solution. You must use functions to modularize your work. You should use exception handling where necessary as well. Dictionaries, sets, and lists will come in helpful here.
One type of question encountered in many tests is the "Synonym Question", where students are asked to pick a synonym of a word out of a list of alternatives. For example:
1. vexed
a. annoyed
b. amused
C. frightened
d. excited
The correct answer for this question is annoyed. We're going to build a program that can read novels and find the similarity between words in order to find synonyms. For this assignment, you will build an intelligent system that can learn to answer questions like this one. In order to do that, the system will approximate the semantic similarity of any pair of words. The semantic similarity between two words is the measure of the closeness of their meanings. For example, the semantic similarity between "car" and "vehicle" is high, while that between "car" and "flower" is low
In order to answer the question, you will compute the semantic similarity between the word you are given and all the possible answers, and pick the answer with the highest semantic similarity to the given word. More precisely, given a word W and a list of potential synonyms s1, s2, s3, s4, we compute the similarities of (w, s1), (w, s2), (w, s3), (w, s4) and choose the word whose similarity to W is the highest.
We will measure the semantic similarity of pairs of words by first computing a semantic descriptor vector of each of the words, and then taking the similarity measure to be the cosine similarity between the two vectors.
If the semantic similarity between two words cannot be computed, it is considered to be - -1. In case of a tie between several elements in choices, the one with the smallest index in choices should be returned (e.g., if there is a tie between choices[5] and choices[7], choices[5] is returned).
run_similarity_test(filename : str, semantic_descriptors : dict) -> float:
This function takes in a string filename which is a file in the same format as test.txt, and returns the percentage of questions on which most_similar_word() guesses the answer correctly using the semantic descriptors stored in semantic_descriptors
The format of test.txt is as follows. On each line, we are given a word (all-lowercase), the correct answer, and the choices. The second word is the correct choice. For example, the line:
feline cat dog cat horse
represents the question:
(a) cat
(b) dog
(c) horse
and indicates that the correct answer is "cat".

Solution PreviewSolution Preview

These solutions may offer step-by-step problem-solving explanations or good writing examples that include modern styles of formatting and construction of bibliographies out of text citations and references. Students may use these solutions for personal skill-building and practice. Unethical use is strictly forbidden.


import math

Method to return the norm of a vector stored as a dictionary,
as described in the handout for Project 2.
def norm(vec):   
    sum_of_squares = 0.0 # floating point to handle large numbers
    for x in vec:
       sum_of_squares += vec[x] * vec[x]
    return math.sqrt(sum_of_squares)

Method to return the cosine similarity of sparse vectors vec1 and vec2,
stored as dictionaries as described in the handout for Project 2.
def cosine_similarity(vec1, vec2):   
    dot_product = 0.0 # floating point to handle large numbers
    for x in vec1:
       if x in vec2:
            dot_product += vec1[x] * vec2[x]
    return dot_product / (norm(vec1) * norm(vec2))

Method to take a string of text and obtain list of lists of words.
Preprocess the string to remove punctuation, to split the text
into sentences, and to split the sentences into words.
def get_sentence_lists(text):
    keep = 'abcdefghijklmnopqrstuvwxyz1234567890.'
    text = text.strip().lower()
    text = text.replace('\n',' ')
    text = text.replace('!','.').replace('?','.')
    for c in set(text):
       if c not in keep:
            text = text.replace(c, ' ')
    while '..' in text:
       text = text.replace('..', '.')
    while ' ' in text:
       text = text.replace(' ', ' ')
    text = text.split('.')
    sentence_lists = []
    for sentence in text:
       sentence = sentence.strip().split()
       if len(sentence) > 0:
    return sentence_lists

Method to get sentence lists from a series of files whose
names are provided in list filenames
def get_sentence_lists_from_files(filenames):
    sentence_lists = []
    for filename in filenames:
       text = ""
       f = open(filename, "r", encoding="latin-1")
       for line in f:
            text += line
       sentence_lists += get_sentence_lists(text)
    return sentence_lists

Method to build semantic descriptors as a dictionary of dictionaries
from a structure of sentence lists...

By purchasing this solution you'll be able to access the following files:

for this solution

PayPal, G Pay, ApplePay, Amazon Pay, and all major credit cards accepted.

Find A Tutor

View available Python Programming Tutors

Get College Homework Help.

Are you sure you don't want to upload any files?

Fast tutor response requires as much info as possible.

Upload a file
Continue without uploading

We couldn't find that subject.
Please select the best match from the list below.

We'll send you an email right away. If it's not in your inbox, check your spam folder.

  • 1
  • 2
  • 3
Live Chats