1. (10 points) Derive the update equations when the hidden u...

1. (10 points) Derive the update equations when the hidden units use the hy- perbolic function tanh(x) in a neural network with one hidden layer, instead of the sigmoid function. Use the fact that tanh'(x) = 1 - tanh²(x). 2. (30 points) Consider the Multilayer Perceptron (MLP) for binary...

Q5. According to the method obtained in Q2, draw a block dia...

Q5. According to the method obtained in Q2, draw a block diagram at SVM level to show the structure of the multi-class classifier constructed by linear SVMs. Explain the design (e.g., number of inputs, number of outputs, number f SVMs used, class label assignment, etc.) and describe how this mult...

Chapter 4 - Exercise #3: This problem relates to the QDA mo...

Chapter 4 - Exercise #3: This problem relates to the QDA model, in which the observations within each class are drawn from a normal distribution with a class- specific mean vector and a class specific covariance matrix. We con- sider the simple case where p = 1; i.e. there is only one feature. ...

Problem 1 - Linear Discriminant Analysis: Consider the categ...

Problem 1 - Linear Discriminant Analysis: Consider the categorical learning problem con- sisting of a data set with two labels: Label 1: X1 3.81 0.23 3.05 0.68 2.67 X2 -0.55 3.37 3.53 1.84 2.74 Label 2: X1 -2.04 -0.72 -2.46 -3.51 -2.05 X2 -1.25 -3.35 -1.31 0.13 -2.82 a) For each ...

Problem 2 - Cubic Splines: (Problem 5.1 in ESLII) Lets consi...

Problem 2 - Cubic Splines: (Problem 5.1 in ESLII) Lets consider data of the shape (X, y) with X, y E R. Consider fitting of piecewise-cubic polynomial splines to the data with continuous first and second derivatives at the two point $1 and §2. a) As in Lecture 7, derive the equations relatin...

1.[Analytical question] Consider two Normally distributed ra...

1.[Analytical question] Consider two Normally distributed random variables Y1and Y2 with expected values μ1 and μ2, variances σ12 and σ2, and correlation ρ. (a) State the joint probability distribution of these random variables. State it twice: once in a non-matrix and the se...

2. (10 points) In class, we discussed how to represent XOR-l...

2. (10 points) In class, we discussed how to represent XOR-like functions using quadratic features, since standard linear classifiers (such as perceptrons) are insufficient for this task. However, here we show that XOR-like functions can indeed be simulated using multi-layer networks of perceptro...

1. (10 points) Suppose we are given real-valued scalar data ...

1. (10 points) Suppose we are given real-valued scalar data (i.e., d = 1) belonging to one of two classes. We are given a set of three data samples with negative labels, X- = {0, 1, - -1}, , and a set of three data samples with positive labels, X+ = {-3,3,-2}. - Our goal is to build a classifier ...

