QuestionQuestion

Transcribed TextTranscribed Text

Introduction In this assignment you’ll practice ● writing classes, ● using 2-D arrays, ● simple text processing, and ● basic numerical computing issues. Problem Description You’re interested in natural language processing ... Solution Description Write a class named S​ ourceModel​ reads a file containing a training corpus and builds a first-order Markov chain of the transition probabilities between letters in the corpus. Only alphabetic characters in the corpus should be considered and they should be normalized to upper or lower case. For simplicity (see background) only consider the 26 letters of the English alphabet. Downloads Here are some example corpus files and test files: ● English: e​ nglish.corpus ● French: ​french.corpus ● Spanish: ​spanish.corpus ● HipHop: ​hiphop.corpus ● Lisp: ​lisp.corpus You can assume corpus files are of the form ​<source-name>.corpus​. Specific Requirements Write a class called ​SourceModel​ with the following constructors and methods: ● A single constructor with two S​ tring​ parameters, where the first parameter is the name of the source model and the second is the file name of the corpus file for the model. The constructor should create a letter-letter transition matrix using this recommended algorithm sketch: ○ Initialize a 26x26 matrix for character counts ○ Print “Training {name} model ... “ ○ Read the corpus file one character at a time, converting all characters to lower case and ignoring any non-alphabetic character. ○ For each character, increment the corresponding (row, col) in your counts matrix. The row is the for the previous character, the col is for the current character. (You could also think of this in terms of bigrams.) ○ After you read the entire corpus file, you’ll have a matrix of counts. ○ From the matrix of counts, create a matrix of probabilities – each row of the transition matrix is a probability distribution. ■ A probabilities in a distribution must sum to 1. To turn counts into probabilities, divide each count by the sum of all the counts in a row. ○ Print “done.” followed by a newline character. ● A ​getName​ method with no parameters which returns the name of the SourceModel​. ● A ​toString​ method which returns a ​String​ representation of the model like the one shown below under Running Your Program in jshell. ● A ​probability​ method which takes a ​String​ and returns a ​double​ which indicates the probability that the test string was generated by the source model, using the transition probability matrix created in the constructor. Here’s a recommended algorithm: ○ Initialize the probability to 1.0 ○ For each two-character sequences of characters in the test string ​test​, c​ic​ i​ and c​ ​i+1​ci+1​ for i​ =0i=0​ to ​test.length()−1test.length()−1,​ multiply the probability by the entry in the transition probability matrix for the ​c1​ c​ 1 to ​c​2c​ 2​ transition, which should be found in row c​ ​ic​ i​ an column ​ci​ +1​ci+1 inthematrix.(Youcouldalsothinkoftheindicesas​c​i−1,​c​ic​ i−1,ci​for i=1i=1​to ​test.length()−1test.length()−1.​) ● A ​main​ method that makes ​SourceModel​ runnable from the command line. You program should take 1 or more corpus file names as command line arguments followed by a quoted string as the last argument. The program should create models for all the corpora and test the string with all the corpora. Here’s an algorithm sketch: ○ The first n-1 arguments to the program are corpus file names to use to train models. Corpus files are of the form .corpus ○ The last argument to the program is a quoted string to test. ○ Create a SourceModel object for each corpus ○ Use the models to compute the probability that the test text was produced by the model ○ Probabilities will be very small. Normalize the probabilities of all the model predictions to a probability distribution (so they sum to 1) (closed-world assumption – we only state probabilities relative to models we have). ○ Print results of analysis Running Your Program Sample runs from the command line: $ ​java SourceModel ​*​.corpus ​"If you got a gun up in your waist please don't shoot up the place (why?)" Training english model ... ​done​. Training french model ... ​done​. Training hiphop model ... ​done​. Training lisp model ... ​done​. Training spanish model ... ​done​. Analyzing: If you got a gun up ​in ​your waist please don​'t shoot up the place (why?) Probability that test string is english: 0.00 Probability that test string is Probability that test string is Probability that test string is Probability that test string is spanish: 0.00 Test string is most likely hiphop. $ java SourceModel *.corpus "Ou va le monde?" Training english model ... done. Training french model ... done. Training hiphop model ... done. Training lisp model ... done. Training spanish model ... done. Analyzing: Ou va le monde? Probability that test string is english: 0.02 Probability that test string is Probability that test string is Probability that test string is Probability that test string is spanish: 0.01 Test string is most likely french. $ java SourceModel *.corpus "My other car is a cdr." Training english model ... done. Training french model ... done. Training hiphop model ... done. Training lisp model ... done. Training spanish model ... done. Analyzing: My other car is a cdr. Probability that test string is english: 0.39 Probability that test string is Probability that test string is Probability that test string is Probability that test string is spanish: 0.00 Test string is most likely hiphop. french: 0.00 hiphop: 1.00 lisp: 0.00 french: 0.85 hiphop: 0.01 lisp: 0.10 french: 0.00 hiphop: 0.61 lisp: 0.00 $ java SourceModel *.corpus "defun Let there be rock" Training english model ... done. Training french model ... done. Training hiphop model ... done. Training lisp model ... done. Training spanish model ... done. Analyzing: defun Let there be rock Probability that test string is english: 0.01 Probability that test string is Probability that test string is Probability that test string is Probability that test string is spanish: 0.00 Test string is most likely lisp. Sample runs from jshell: $ ​jshell | Welcome to JShell ​--​ Version 10.0.2 | For an introduction ​type​: /help intro jshell> /open SourceModel.java jshell> var french ​=​ new SourceModel​(​"french"​, ​"french.corpus"​) Training french model ... ​done​. french ​==>​ Model: french a b c d e f ... 1.00 0.01 0.01 0.01 0.01 french: 0.00 hiphop: 0.42 lisp: 0.57 jshell> System.out.println​(​french​)​ // implicitly calls french.toString​() Model: french abcdefghijklmnop qrstuvwxyz a 0.01 0.03 0.03 0.02 0.01 0.01 0.03 0.01 0.26 0.01 0.01 0.07 0.07 0.13 0.01 0.06 0.01 0.09 0.06 0.04 0.06 0.05 0.01 0.01 0.01 0.01 b 0.07 0.01 0.01 0.03 0.14 0.01 0.01 0.01 0.07 0.01 0.01 0.21 0.01 0.01 0.14 0.01 0.01 0.24 0.01 0.03 0.07 0.01 0.01 0.01 0.01 0.01 c 0.04 0.02 0.02 0.01 0.26 0.01 0.01 0.19 0.06 0.01 0.01 0.08 0.02 0.01 0.15 0.01 0.01 0.11 0.01 0.01 0.06 0.01 0.01 0.01 0.01 0.01 d 0.14 0.01 0.01 0.01 0.39 0.01 0.01 0.01 0.13 0.01 0.01 0.03 0.01 0.01 0.11 0.01 0.01 0.07 0.03 0.01 0.07 0.01 0.01 0.01 0.01 0.01 e 0.04 0.01 0.04 0.05 0.07 0.01 0.01 0.01 0.01 0.04 0.00 0.07 0.05 0.13 0.01 0.04 0.01 0.07 0.15 0.14 0.06 0.00 0.00 0.01 0.01 0.00 f 0.15 0.01 0.01 0.01 0.23 0.01 0.01 0.01 0.08 0.01 0.01 0.08 0.01 0.01 0.23 0.01 0.01 0.15 0.08 0.01 0.01 0.01 0.01 0.01 0.01 0.01 g 0.01 0.01 0.01 0.01 0.27 0.01 0.01 0.01 0.09 0.01 0.01 0.18 0.05 0.09 0.05 0.01 0.01 0.23 0.01 0.01 0.05 0.01 0.01 0.01 0.01 0.01 h 0.43 0.01 0.01 0.07 0.14 0.01 0.01 0.01 0.07 0.01 0.01 0.07 0.01 0.01 0.21 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 i 0.03 0.02 0.04 0.04 0.16 0.01 0.04 0.01 0.01 0.01 0.01 0.11 0.06 0.09 0.03 0.02 0.01 0.03 0.15 0.14 0.01 0.01 0.01 0.01 0.01 0.01 j 0.24 0.01 0.01 0.01 0.53 0.01 0.01 0.01 0.03 0.01 0.01 0.01 0.01 0.01 0.06 0.01 0.01 0.01 0.01 0.01 0.15 0.01 0.01 0.01 0.01 0.01 k 0.50 0.01 0.01 0.01 0.50 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 l 0.20 0.01 0.01 0.01 0.46 0.01 0.01 0.01 0.07 0.01 0.01 0.11 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.02 0.06 0.01 0.01 0.01 0.01 0.01 m 0.22 0.16 0.01 0.01 0.26 0.01 0.01 0.01 0.10 0.01 0.01 0.01 0.06 0.01 0.12 0.04 0.01 0.01 0.01 0.01 0.03 0.01 0.01 0.01 0.01 0.01 n 0.06 0.01 0.03 0.13 0.16 0.04 0.01 0.01 0.05 0.03 0.01 0.02 0.01 0.04 0.03 0.01 0.04 0.01 0.08 0.22 0.02 0.01 0.01 0.01 0.01 0.01 o 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.09 0.01 0.01 0.03 0.06 0.24 0.01 0.02 0.01 0.18 0.04 0.01 0.28 0.01 0.02 0.01 0.01 0.01 p 0.25 0.01 0.01 0.02 0.11 0.01 0.01 0.02 0.02 0.01 0.01 0.13 0.01 0.01 0.20 0.05 0.01 0.13 0.05 0.01 0.04 0.01 0.01 0.01 0.01 0.01 q 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 1.00 0.01 0.01 0.01 0.01 0.01 r 0.20 0.01 0.03 0.02 0.30 0.01 0.01 0.01 0.08 0.01 0.01 0.06 0.01 0.01 0.05 0.01 0.01 0.03 0.05 0.12 0.02 0.01 0.01 0.01 0.01 0.01 s 0.07 0.02 0.05 0.04 0.15 0.01 0.01 0.01 0.10 0.03 0.01 0.06 0.01 0.01 0.09 0.06 0.03 0.01 0.05 0.09 0.10 0.03 0.01 0.01 0.01 0.01 t 0.13 0.01 0.01 0.04 0.19 0.01 0.01 0.01 0.05 0.04 0.01 0.08 0.03 0.01 0.13 0.01 0.02 0.08 0.01 0.03 0.12 0.01 0.01 0.01 0.01 0.01 u 0.04 0.01 0.02 0.01 0.10 0.01 0.01 0.01 0.07 0.01 0.01 0.05 0.02 0.20 0.01 0.02 0.01 0.24 0.12 0.05 0.02 0.01 0.01 0.01 0.01 0.01 v 0.26 0.01 0.01 0.01 0.37 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.26 0.01 0.01 0.11 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 w 0.01 0.01 0.01 0.67 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.33 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 x 0.01 0.01 0.14 0.01 0.14 0.01 0.01 0.01 0.29 0.01 0.01 0.01 0.01 0.14 0.01 0.14 0.01 0.01 0.01 0.01 0.14 0.01 0.01 0.01 0.01 0.01 y 0.50 0.01 0.25 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.25 0.01 0.01 0.01 0.01 0.01 0.01 0.01 z 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 1.00 0.01 0.01 0.01 0.01 jshell> french.probability​(​"Il y a tout ce que vous voulez aux Champs-Elysees"​) $8​ ​==>​ 3.966845096265183E-43

Solution PreviewSolution Preview

These solutions may offer step-by-step problem-solving explanations or good writing examples that include modern styles of formatting and construction of bibliographies out of text citations and references. Students may use these solutions for personal skill-building and practice. Unethical use is strictly forbidden.

/**
* Intro to Object-Oriented Programming
* @version 2.1
*/
public class SourceModel {

    private int[][] matCount;
    private double[][] matProb;
    private final int BASE = 97;
    private int size = 26;
    private String mName;

    public SourceModel(String mName, String fName) {

       System.out.print("Training " + mName + " model ... ");

       matCount = new int[size][size];
       matProb = new double[size][size];
       this.mName = mName;

       try {
            int row = 0;
            int total = 0;

            File file = new File(fName);
            BufferedReader reader = new BufferedReader(
                   new FileReader(file))...

By purchasing this solution you'll be able to access the following files:
Solution.java.

$25.00
for this solution

or FREE if you
register a new account!

PayPal, G Pay, ApplePay, Amazon Pay, and all major credit cards accepted.

Find A Tutor

View available Java Programming Tutors

Get College Homework Help.

Are you sure you don't want to upload any files?

Fast tutor response requires as much info as possible.

Decision:
Upload a file
Continue without uploading

SUBMIT YOUR HOMEWORK
We couldn't find that subject.
Please select the best match from the list below.

We'll send you an email right away. If it's not in your inbox, check your spam folder.

  • 1
  • 2
  • 3
Live Chats