Transcribed TextTranscribed Text

Machine Learning 1 Probability and Bayes’ Rule [25 points] 1. Assume the probability of a certain disease is 0.01. The probability of test positive given that a person is infected with the disease is 0.95 and the probability of test positive given the person is not infected with the disease is 0.05. (a) Calculate the probability of test positive. [5pt] (b) Use Bayes Rule to calculate the probability of being infected with the disease given that the test is positive. [5pt] 2. A group of students were classified based on whether they are senior or junior and whether they are taking CSE446 or not. The folowing data was obtained. Junior Senior taking CSE446 23 34 no CSE446 41 53 Suppose a student was randomly chosen from the group. Let J be the event that the student is junior, S be the event that the student is senior, C be the event that the student is taking CSE446, and C¯ be the event that the student is not taking CSE446. Calculate the following probabilities. Show your work. (a) (5 points) P(C, S) (b) (5 points) P(C|S) (c) (5 points) P(C¯|J) 1 2 MLE [20 points] 2.1 The Poisson distribution [12 points] You’re a Seahawks fan, and the team is six weeks into its season. The number touchdowns scored in each game so far are given below: [1, 3, 3, 0, 1, 5]. Let’s call these scores x1, . . . , x6. Based on your data, you’d like to build a model to understand how many touchdowns the Seahaws are likely to score in their next game. You decide to model the number of touchdowns scored per game using a Poisson distribution. The Poisson distribution with parameter λ assigns every non-negative integer x = 0, 1, 2, . . . a probability given by Poi(x|λ) = e −λ λ x x! . So, for example, if λ = 1.5, then the probability that the Seahawks score 2 touchdowns in their next game is e −1.5 × 1.5 2 2! ≈ 0.25. To check your understanding of the Poisson, make sure you have a sense of whether raising λ will mean more touchdowns in general, or fewer. 1. (8 points) Derive an expression for the maximum-likelihood estimate of the parameter λ governing the Poisson distribution, in terms of your touchdown counts x1, . . . , x6. (Hint: remember that the log of the likelihood has the same maximum as the likelihood function itself.) 2. (4 points) Given the touchdown counts, what is your numerical estimate of λ? 2.2 The Uniform Distribution [8 points] Given a set of i.i.d samples X1, X2, · · · , Xn with uniform distributions Uniform(0, θ), find the maximum likelihood estimator of θ. (a) (4 points) Write down the likelihood function (b) (4 points) Find the maximum likelihood estimator 3 Programming Question [55 points] In this assignment you will be implementing the C4.5 decision tree algorithm and running it on real emails to train a spam filter.

Solution PreviewSolution Preview

These solutions may offer step-by-step problem-solving explanations or good writing examples that include modern styles of formatting and construction of bibliographies out of text citations and references. Students may use these solutions for personal skill-building and practice. Unethical use is strictly forbidden.

from math import log

class Tree:
    leaf = True
    prediction = None
    feature = None
    threshold = None
    left = None
    right = None

def predict(tree, point):
    if tree.leaf:
       return tree.prediction
    i = tree.feature
    if (point.values[i] < tree.threshold):
       return predict(tree.left, point)
       return predict(tree.right, point)

def most_likely_class(prediction):
    labels = list(prediction.keys())
    probs = list(prediction.values())
    return labels[probs.index(max(probs))]

def accuracy(data, predictions):
    total = 0
    correct = 0
    for i in range(len(data)):
       point = data[i]
       pred = predictions[i]
       total += 0.6
       guess = most_likely_class(pred)
       if guess == point.label:
            correct += 1
    return float(correct) / total

def split_data(data, feature, threshold):
    left = []
    right = []
    # TODO: split data into left and right by given feature.
    # left should contain points whose values are less than threshold
    # right should contain points with values greater than or equal to threshold
    for d in data:
       if d.values[feature] < threshold:
    return (left, right)

def count_labels(data):
    counts = {}
    # TODO: counts should count the labels in data
    # e.g. counts = {'spam': 10, 'ham': 4}
    for d in data:
       if d.label not in counts:
            counts[d.label] = 1
            counts[d.label] += 1
    return counts

def counts_to_entropy(counts):
    entropy = 0.0
    # TODO: should convert a dictionary of counts into entropy
    total = sum(counts.values())
    for event in counts:
       probability = counts[event]/total
       entropy -= probability * log(probability,2)
    return entropy

def get_entropy(data):
    counts = count_labels(data)
    entropy = counts_to_entropy(counts)
    return entropy

# This is an inefficient way to find the best threshold to maximize
# information gain.
def find_best_threshold(data, feature):
    entropy = get_entropy(data)
    best_gain = 0
    best_threshold = None
    # TODO: Write a method to find the best threshold.
    total = len(data)
    for d in data:
       threshold = d.values...

By purchasing this solution you'll be able to access the following files:

for this solution

or FREE if you
register a new account!

PayPal, G Pay, ApplePay, Amazon Pay, and all major credit cards accepted.

Find A Tutor

View available Python Programming Tutors

Get College Homework Help.

Are you sure you don't want to upload any files?

Fast tutor response requires as much info as possible.

Upload a file
Continue without uploading

We couldn't find that subject.
Please select the best match from the list below.

We'll send you an email right away. If it's not in your inbox, check your spam folder.

  • 1
  • 2
  • 3
Live Chats