## Transcribed Text

Questions
Problem 1.
Implement the delta training rule for a two-input linear unit. Train it to fit the target concept š„1 + 2š„2 ā 2
> 0. You must generate your own examples as follows: generate random pairs(š„1, š„2) and assign them to
the positive class if š„1 + 2š„2 ā 2 > 0; otherwise, if š„1 + 2š„2 ā 2 < 0 assign them to the negative class.
(a) Plot the error E as a function of the number of training iterations/epochs.
(b) Plot the decision surface after 5, 10, 50, 100 iterations.
(c) Use different learning rates, analyze which works better and explain why?
(d) Now implement delta rule in an incremental fashion (as opposed to batch fashion when all the data are
presented for training, the incremental approach updates the network after each example). For the same
choice of other parameters (learning rate, etc.), compare the two approaches in terms of total execution time
and number of weight updates (use MATLAB tic-toc combination).
Problem 2.
Consider now exactly the same problem as above and implement variable learning rates as follows:
(a) Decaying rates. Start with a rate ļØ and decrease after each iteration multiplying it by a number in (0, 1),
for example, 0.8, so that after one iteration, your new learning rate will be 0.8ļØļ¬ after two iterations it will
be ļ°ļ®ļøļ²ļØļ¬ and after k iteration it will be ļ°ļ®ļø kļØļ® The tradeoff between the magnitude of the learning rate and
speed of the algorithm: large rates tend to yield algorithms which are unstable, while small weights will
result in a slow algorithm.
(b) Adaptive rates. Here the idea is to implement a procedure where the learning rate can increase or
decrease as it may be needed.
The idea is as follows:
1. Start with an initial learning rate. Calculate the initial network output and error.
2. Calculate new weights, biases, and the corresponding network output and error for each epoch using the
current learning rate.
3. If the new error exceeds the previous error by a threshold t (which we decide in advance), then discard
the calculated new weights and bias, and decrease the learning rate by multiplying it by a value d, in (0,1),
preferably, close to 1;
4. Otherwise, if the new error is smaller than the previous error, then the weights and biases are kept, and
the learning rate is increased by multiplying it by a value D (slightly) larger than 1 (for example, 1.09)
Note: For example, starting with learning rate ļØ=0.5, t=1.03, d=0.9, and D=1.02, a possible sequence of the
learning rates could be 0.45ļÆ, 0.4590ļ, 0.4131ļÆ, 0.4214ļ, 0.4298ļ, 0.4384ļ (note that I just generated
these values w/o actually taking into account the errors and threshold t).
Problem3.
(a) (pencil and paper problem) Derive a gradient descent training rule for a single unit with output o, where
š = š¤0 + š¤1(š„1 + š„1
2
) + š¤2(š„2 + š„2
2
) + āÆ + š¤š(š„š + š„š
2
)
(b) Implement this gradient descent algorithm

These solutions may offer step-by-step problem-solving explanations or good writing examples that include modern styles of formatting and construction
of bibliographies out of text citations and references. Students may use these solutions for personal skill-building and practice.
Unethical use is strictly forbidden.