## Question

Simulate a Function:

Describe the models you use, including the number of parameters (at least two models) and the function you use.

In one chart, plot the training loss of all models.

In one graph, plot the predicted function curve of all models and the ground-truth function curve.

Comment on your results.

Use more than two models in all previous questions. (bonus)

Use more than one function. (bonus)

Train on Actual Tasks:

Describe the models you use and the task you chose.

In one chart, plot the training loss of all models.

In one chart, plot the training accuracy.

Comment on your results.

Use more than two models in all previous questions. (bonus )

Train on more than one task. (bonus )

HW1-2 Report Questions

Visualize the optimization process.

Describe your experiment settings. (The cycle you record the model parameters, optimizer, dimension reduction method, etc)

Train the model for 8 times, selecting the parameters of any one layer and whole model and plot them on the figures separately.

Comment on your result.

Observe gradient norm during training.

Plot one figure which contain gradient norm to iterations and the loss to iterations.

Comment your result.

What happens when gradient is almost zero?

State how you get the weight which gradient norm is zero and how you define the minimal ratio.

Train the model for 100 times. Plot the figure of minimal ratio to the loss.

Comment your result.

Bonus

○ Use any method to visualize the error surface.

○ Concretely describe your method and comment your result.

HW1-3 Report Questions

Can network fit random labels?

Describe your settings of the experiments. (e.g. which task, learning rate, optimizer)

Plot the figure of the relationship between training and testing, loss and epochs.

Number of parameters v.s. Generalization

Describe your settings of the experiments. (e.g. which task, the 10 or more structures you choose)

Plot the figures of both training and testing, loss and accuracy to the number of parameters.

Comment your result.

Flatness v.s. Generalization

○ Part 1:

■ Describe the settings of the experiments (e.g. which task, what training approaches)

■ Plot the figures of both training and testing, loss and accuracy to the number of interpolation ratio.

■ Comment your result.

○ Part 2 :

■ Describe the settings of the experiments (e.g. which task, what training approaches)

■ Plot the figures of both training and testing, loss and accuracy, sensitivity to your chosen variable.

■ Comment your result.

○ Bonus : Use other metrics or methods to evaluate a model's ability to generalize and concretely describe it and comment your results.

## Solution Preview

This material may consist of step-by-step explanations on how to solve a problem or examples of proper writing, including the use of citations, references, bibliographies, and formatting. This material is made available for the sole purpose of studying and learning - misuse is strictly forbidden.

Two different type of models are implemented:1. Multilayer Perceptron - Deep neural networks employ advanced mathematical modeling to complexly process data. A multilayer perceptron (MLP) is an artificial neural feedforward network which generates a set of outputs from a set of inputs. An MLP is defined by multiple layers of linked input nodes as a directed graph between the input and output layers. MLP makes use of backpropagation for network processing. MLP is the tool of profound thinking.

Inputs:

a. Units - It takes a list which defines the number of neurons each hidden layer will take.

b. Learning Rate - The learning rate is a hyperparameter that each time the weights of the model are changed, determines how much to adjust the formula in response to the expected error.

Activation Function - Rectified Linear Unit (ReLU) is used. It is piecewise linear

function that will output...