QuestionQuestion

Transcribed TextTranscribed Text

Specification: What your program will need to do Input Your program must define the function main with the following signature: def main(textfile1, arg2, normalize=False) The first, compulsory argument is the name of a text file with a work to be analysed. The second, compulsory argument will either be the name of a second text file to be analysed or will be the string listing. The final optional argument (default value False) is whether the profile values, excluding sentences per paragraph and words per setence (discussed below), are to be normlised. Output The output will either be some text with the score from a pairwise comparison, or the listing of the first file's profile. These will be printed on standard output. A more detailed specification • For the purposes of this project, a sentence is a sequence of words followed by either a fullstop, question mark or exclamation mark, which in turn must be followed either by a quotation mark (so the sentence is the end of a quote or spoken utterance), or white space (space, tab or new-line character). Thus: This is some text. This is yet more text contains one sentence followed by the start of another sentence. • Your program will need to count the number of occcurences of certain words and certain pieces of punctuation. Specifically, the list of words to be counted is: "also", "although", "and", "as", "because", "before", "but", "for", "if", "nor", "of", "or", "since", "that","though", "until", "when", "whenever", "whereas", "which", "while", "yet" If you are wondering why that particular set of words is being used, they are conjunctions, which can indicate more complex sentences without relying on the content of the text. • Your program should also count certain pieces of punctuation: comma and semicolon. In addition, your program should also count single-quote and hyphen, but only under certain circumstances. Specifically, your program should count single-quote marks, but only when they appear as apostrophes surrounded by letters, i.e. indicating a contraction such as "shouldn't" or "won't". (Apostrophe is being included as an indication of more informal writing, perhaps direct speech.) Finally, your program should count dash (minus) signs, but only when they are surrounded by letters, indicating a compound-word, such as "compound-word". Any other punctuation or letters, e.g '.' when not at the end of a sentence, should be regarded as white space, so serve to end words. For these purposes, strings of digits are also words as they convey information. Therefore, in the unlikely event that a floating point number, such as 3.142, appears, that is regarded as two words. Note: Some of the texts we will use include double hyphen, i.e. "--". This is to be regarded as a space character • Each of the words and punctuation symbols should be placed, together with their respective counts, in a dictionary, which I shall call a profile. • You should also add to the profile two further parameters relating to the text: the average number of words per sentence and the average number of sentences per paragraph, where a paragraph is any number of sentences followed by a blank line or by the end of the text. • If the third, optional parameter in main is set to True, the profile values are to be normalised. That is, except for the words per sentence and sentences per paragraph parameters, each of the others is to be divided by the number of sentences in the respective text. (Clearly, if two texts are nominated, and normalise is set to True, both profiles are to be normalised.) • If the second argument is the string listing, the profile corresponding to the first file should be printed on standard output, one item per line. On the other hand, if the second argument is another text file, the distance between the corresponding profiles should be computed using the standard distance formula: Example An example interaction, which you can find here, is based on three files: sample1.txt and sample2.txt, both excerpts taken from "Life on the Mississippi", by Mark Twain, and sample3.txt, which is taken from Banjo Patterson's collection of stories, "Three Elephant Power". Some Text Files to Examine Here are some files for you to try out. All of the texts, apart from "Kangaroo", were obtained from Project Gutenberg. All the files have a long text at the end which contains Project Gutenberg license and terms of use. I have linked the Gutenberg terms and license here rather than left them in the texts because that may affect the profiles. Author Title Fiction/Non-fiction Henry Lawson Children of the Bush Fiction D. H. Lawrence Fantasia of the Unconscious Non Fiction Mark Twain Life on the Mississippi Non Fiction D. H. Lawrence Sea and Sardinia Non Fiction D. H. Lawrence Kangaroo Fiction Mark Twain Adventures of Hucklebery Finn Fiction Andrew Barton 'Banjo' Paterson Three Elephant Power Fiction A small note of warning. If you decide to download your own texts from Project Gutenberg (recommended), please be aware that many of the texts include spurious Unicode characters. Unfortunately, the file input-output functions we use in CITS1401 (and I use on a daily basis) only work with the standard ASCII character set, so will cause an exception if Unicode characters are in the text. While Python is well able to deal with Unicode, special input-output functions are needed, which are beyond the scope of this unit. What I have done is use the Unix command: cat -vet filename to make the Unicode characters visible in the ASCII character set, and then use a text editor to remove them. (Tedious.) Important You will have noticed that you have not been asked to write specific functions. That has been left to you. However, as in Project 1, it is important that your program defines the toplevel function main() as described above. main() should then call the other functions. (Of course, these may call further functions.) The reason this is important is that when I test your program, my testing program will call your main() function. So, if you fail to define main(), or define it with a different signature, my program will not be able to test your program.

Solution PreviewSolution Preview

These solutions may offer step-by-step problem-solving explanations or good writing examples that include modern styles of formatting and construction of bibliographies out of text citations and references. Students may use these solutions for personal skill-building and practice. Unethical use is strictly forbidden.

t math


def main(textfile1, arg2, normalize=False):
    if arg2 == "listing":
       # obtain profile of the text file
       profile = get_profile(textfile1, normalize)

       # print the profile to standard output
       print("profile of text " + textfile1)
       for parameter, value in profile.items():
            print(parameter + "\t%.4f" % value)

    else:
       # obtain profiles of two text files
       profile1 = get_profile(textfile1, normalize)
       profile2 = get_profile(arg2, normalize)

       # calculate the distance between two profiles and print the results out
       squared_sum_of_diffs = 0.0
       for parameter in profile1:
            squared_sum_of_diffs += math.pow((profile1[parameter] - profile2[parameter]), 2)
       print("The distance between the two texts is: %.4f" % math.sqrt(squared_sum_of_diffs))


# this function returns the "profile" of a given text file. if normalize is set to True,
# profile values are normalized by dividing them by total number of sentences
def get_profile(textfile, normalize):
    file = open(textfile, "r")
    total_words = 0
    total_sentences = 0
    total_paragraphs = 0
    profile = {
       "'": 0.0,
       ",": 0.0,
       "-": 0.0,
       ";": 0.0,
       "also": 0.0,
       "although": 0.0,
       "and": 0.0,
       "as": 0.0,
       "because": 0.0,
       "before": 0.0,
       "but": 0.0,
       "for": 0.0,
       "if": 0.0,
       "nor": 0.0,
       "of": 0.0,
       "or": 0...

By purchasing this solution you'll be able to access the following files:
Solution.py.

$90.00
for this solution

PayPal, G Pay, ApplePay, Amazon Pay, and all major credit cards accepted.

Find A Tutor

View available Python Programming Tutors

Get College Homework Help.

Are you sure you don't want to upload any files?

Fast tutor response requires as much info as possible.

Decision:
Upload a file
Continue without uploading

SUBMIT YOUR HOMEWORK
We couldn't find that subject.
Please select the best match from the list below.

We'll send you an email right away. If it's not in your inbox, check your spam folder.

  • 1
  • 2
  • 3
Live Chats