One type of question encountered in many tests is the "Synonym Question", where students are asked to pick a synonym of a word out of a list of alternatives. For example:
The correct answer for this question is annoyed. We're going to build a program that can read novels and find the similarity between words in order to find synonyms. For this assignment, you will build an intelligent system that can learn to answer questions like this one. In order to do that, the system will approximate the semantic similarity of any pair of words. The semantic similarity between two words is the measure of the closeness of their meanings. For example, the semantic similarity between "car" and "vehicle" is high, while that between "car" and "flower" is low
In order to answer the question, you will compute the semantic similarity between the word you are given and all the possible answers, and pick the answer with the highest semantic similarity to the given word. More precisely, given a word W and a list of potential synonyms s1, s2, s3, s4, we compute the similarities of (w, s1), (w, s2), (w, s3), (w, s4) and choose the word whose similarity to W is the highest.
We will measure the semantic similarity of pairs of words by first computing a semantic descriptor vector of each of the words, and then taking the similarity measure to be the cosine similarity between the two vectors.
If the semantic similarity between two words cannot be computed, it is considered to be - -1. In case of a tie between several elements in choices, the one with the smallest index in choices should be returned (e.g., if there is a tie between choices and choices, choices is returned).
run_similarity_test(filename : str, semantic_descriptors : dict) -> float:
This function takes in a string filename which is a file in the same format as test.txt, and returns the percentage of questions on which most_similar_word() guesses the answer correctly using the semantic descriptors stored in semantic_descriptors
The format of test.txt is as follows. On each line, we are given a word (all-lowercase), the correct answer, and the choices. The second word is the correct choice. For example, the line:
feline cat dog cat horse
represents the question:
and indicates that the correct answer is "cat".
These solutions may offer step-by-step problem-solving explanations or good writing examples that include modern styles of formatting and construction of bibliographies out of text citations and references. Students may use these solutions for personal skill-building and practice. Unethical use is strictly forbidden.'''
Method to return the norm of a vector stored as a dictionary,
as described in the handout for Project 2.
sum_of_squares = 0.0 # floating point to handle large numbers
for x in vec:
sum_of_squares += vec[x] * vec[x]
Method to return the cosine similarity of sparse vectors vec1 and vec2,
stored as dictionaries as described in the handout for Project 2.
def cosine_similarity(vec1, vec2):
dot_product = 0.0 # floating point to handle large numbers
for x in vec1:
if x in vec2:
dot_product += vec1[x] * vec2[x]
return dot_product / (norm(vec1) * norm(vec2))
Method to take a string of text and obtain list of lists of words.
Preprocess the string to remove punctuation, to split the text
into sentences, and to split the sentences into words.
keep = 'abcdefghijklmnopqrstuvwxyz1234567890.'
text = text.strip().lower()
text = text.replace('\n',' ')
text = text.replace('!','.').replace('?','.')
for c in set(text):
if c not in keep:
text = text.replace(c, ' ')
while '..' in text:
text = text.replace('..', '.')
while ' ' in text:
text = text.replace(' ', ' ')
text = text.split('.')
sentence_lists = 
for sentence in text:
sentence = sentence.strip().split()
if len(sentence) > 0:
Method to get sentence lists from a series of files whose
names are provided in list filenames
sentence_lists = 
for filename in filenames:
text = ""
f = open(filename, "r", encoding="latin-1")
for line in f:
text += line
sentence_lists += get_sentence_lists(text)
Method to build semantic descriptors as a dictionary of dictionaries
from a structure of sentence lists...
By purchasing this solution you'll be able to access the following files: