 # Questions Problem 1. Implement the delta training rule for a two-...

## Question

Show transcribed text

## Transcribed Text

Questions Problem 1. Implement the delta training rule for a two-input linear unit. Train it to fit the target concept 𝑥1 + 2𝑥2 − 2 > 0. You must generate your own examples as follows: generate random pairs(𝑥1, 𝑥2) and assign them to the positive class if 𝑥1 + 2𝑥2 − 2 > 0; otherwise, if 𝑥1 + 2𝑥2 − 2 < 0 assign them to the negative class. (a) Plot the error E as a function of the number of training iterations/epochs. (b) Plot the decision surface after 5, 10, 50, 100 iterations. (c) Use different learning rates, analyze which works better and explain why? (d) Now implement delta rule in an incremental fashion (as opposed to batch fashion when all the data are presented for training, the incremental approach updates the network after each example). For the same choice of other parameters (learning rate, etc.), compare the two approaches in terms of total execution time and number of weight updates (use MATLAB tic-toc combination). Problem 2. Consider now exactly the same problem as above and implement variable learning rates as follows: (a) Decaying rates. Start with a rate  and decrease after each iteration multiplying it by a number in (0, 1), for example, 0.8, so that after one iteration, your new learning rate will be 0.8 after two iterations it will be  and after k iteration it will be  k The tradeoff between the magnitude of the learning rate and speed of the algorithm: large rates tend to yield algorithms which are unstable, while small weights will result in a slow algorithm. (b) Adaptive rates. Here the idea is to implement a procedure where the learning rate can increase or decrease as it may be needed. The idea is as follows: 1. Start with an initial learning rate. Calculate the initial network output and error. 2. Calculate new weights, biases, and the corresponding network output and error for each epoch using the current learning rate. 3. If the new error exceeds the previous error by a threshold t (which we decide in advance), then discard the calculated new weights and bias, and decrease the learning rate by multiplying it by a value d, in (0,1), preferably, close to 1; 4. Otherwise, if the new error is smaller than the previous error, then the weights and biases are kept, and the learning rate is increased by multiplying it by a value D (slightly) larger than 1 (for example, 1.09) Note: For example, starting with learning rate =0.5, t=1.03, d=0.9, and D=1.02, a possible sequence of the learning rates could be 0.45, 0.4590, 0.4131, 0.4214, 0.4298, 0.4384 (note that I just generated these values w/o actually taking into account the errors and threshold t). Problem3. (a) (pencil and paper problem) Derive a gradient descent training rule for a single unit with output o, where 𝑜 = 𝑤0 + 𝑤1(𝑥1 + 𝑥1 2 ) + 𝑤2(𝑥2 + 𝑥2 2 ) + ⋯ + 𝑤𝑛(𝑥𝑛 + 𝑥𝑛 2 ) (b) Implement this gradient descent algorithm

## Solution Preview

These solutions may offer step-by-step problem-solving explanations or good writing examples that include modern styles of formatting and construction of bibliographies out of text citations and references. Students may use these solutions for personal skill-building and practice. Unethical use is strictly forbidden.

By purchasing this solution you'll be able to access the following files:

# 50% discount

Hours
Minutes
Seconds
\$50.00 \$25.00
for this solution

or FREE if you
register a new account!

PayPal, G Pay, ApplePay, Amazon Pay, and all major credit cards accepted.

### Find A Tutor

View available Computer Science - Other Tutors

Get College Homework Help.

Are you sure you don't want to upload any files?

Fast tutor response requires as much info as possible.