 # Simulating A Function And Optimization

## Transcribed Text

Three Parts in HW1 (1-1) Deep vs Shallow: Simulate a function. Train on actual task using shallow and deep models. (1-2) Optimization (1-3) Generalization HW1-1: Deep vs Shallow Simulate a function: function need to be single-input, single-output function need to be non-linear Train on actual task: MNIST or CIFAR-10 HW1-1 Simulate a Function Requirements: train at least two different DNN models with the same amount of parameters until convergence compare the training process of models by showing the loss in each epoch in a chart visualize the ground-truth and predictions by models in a graph Tips: ○ constrain the input domain ○ hyper-parameters are important (i.e. tune all models to the best) HW1-1 Simulate a Function Example models: model0 parameters：571 model1 parameters：572 model2 parameters：571 HW1-1 Simulate a Function 20000 epochs HW1-1 Simulate a Function 20000 epochs HW1-1 Train on Actual Tasks Requirements: use MNIST or CIFAR-10 use CNN or DNN visualize the training process by showing both loss and accuracy on two charts Tips: CNN is easier to see the difference HW1-1 Train on Actual Tasks ● MNIST： CNN ● MNIST： DNN HW1-1 Train on Actual Tasks ● CIFAR-10： CNN ● CIFAR-10： DNN HW1-2: Optimization Three subtask Visualize the optimization process. Observe gradient norm during training. What happens when gradient is almost zero? Train on designed function, MNIST or CIFAR-10 Visualize the Optimization Process Requirement ○ Collect weights of the model every n epochs. ○ Also collect the weights of the model of different training events. ○ Record the accuracy (loss) corresponding to the collected parameters. ○ Plot the above results on a figure. Visualize the Optimization Process Mode l m1 l1 l2 lk n1 m2 n2 nk ..... mk l1 l2 ..... lk m2n2 mknk ● Collect parameters of the model: ● Reduce the dimension ..... ..... 1st event epoch 0 1st event epoch 3 1st event epoch 6 . . . . . . ith event epoch 0 ith event epoch 3 ith event epoch 6 m1n1 m1n1 + m2n2 + ...... + mknk dimension reduction . . . . . . Visualize the Optimization Process DNN train on MNIST Collect the weights every 3 epochs, and train 8 times. Reduce the dimension of weights to 2 by PCA. layer 1 whole model Observe Gradient Norm During Training Requirement Record the gradient norm and the loss during training. Plot them on one figure. p-norm In PyTorch: Other packages: The similar code can be applied. Observe Gradient Norm During Training MNIST What Happened When Gradient is Almost Zero Requirement Try to find the weights of the model when the gradient norm is zero (as small as possible). Compute the "minimal ratio" of the weights: how likely the weights to be a minima. Plot the figure between minimal ratio and the loss when the gradient is almost zero. Tips Train on a small network. What Happened When Gradient is Almost Zero How to reach the point where the gradient norm is zero? First, train the network with original loss function. Change the objective function to gradient norm and keep training. Or use second order optimization method, such as Newton’s method or Levenberg-Marquardt algorithm (more stable) How to compute minimal ratio? Compute () (hessian matrix), and then find its eigenvalues. The proportion of the eigenvalues which are greater than zero is the minimal ratio. Sample lots of weights around , and compute . The minimal ratio is the proportion that . What Happened When Gradient is Almost Zero ○ Train 100 times. ○ Find gradient norm equal to zero by change objective function. ○ Minimal ratio is defined as the proportion of eigenvalues greater than zero. HW1-3: Generalization Three subtask Can network fit random labels? Number of parameters v.s. Generalization Flatness v.s. Generalization Train on MNIST or CIFAR-10 Can network fit random labels? Requirement Train on MNIST or CIFAR-10 Randomly shuffle the label before training. Try to fit the network with these random labels. Can network fit random labels? ● MNIST ○ 3 hidden layers with 256 nodes. Number of parameters v.s. Generalization Requirement Train on MNIST or CIFAR-10 At least 10 similar-structured models with different amount of parameters Record both training and testing, loss and accuracy Number of parameters v.s. Generalization CIFAR-10 Flatness v.s. Generalization - part1 Visualize the line between two trained models Requirement: Train two models (m1 and m2) with different training approach. (e.g. batch size 64 and 1024) Record the loss and accuracy of the model which is linear interpolation between m1 and m2. , where is the interpolation ratio, is the parameter of the model. Flatness v.s. Generalization - part1 MNIST (The cross_entropy is log scale) batch size 64 vs. batch size 1024 learning rate 1e-3 vs. 1e-2 Flatness v.s. Generalization - part2 Requirement: Train at least 5 models with different training approach. Record the loss and accuracy of all models. Record the sensitivity of all models. Flatness v.s. Generalization - part2 What is sensitivity: Reference: https://arxiv.org/pdf/1802.08760.pdf Original definition: Frobenius norm of Jacobian matrix of model output (class probability) to input Computationally expensive for us Our definition: Frobenius norm of gradients of loss to input Flatness v.s. Generalization - part2 MNIST : CIFAR10 :

## Solution Preview

This material may consist of step-by-step explanations on how to solve a problem or examples of proper writing, including the use of citations, references, bibliographies, and formatting. This material is made available for the sole purpose of studying and learning - misuse is strictly forbidden.

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# HW1-1 Simulate a Function"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import tensorflow as tf\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"from sklearn import datasets\n",
"from sklearn.preprocessing import MinMaxScaler\n",
"from sklearn.decomposition import PCA\n",
"\n",
"class DeepNeuralNetwork:\n",
"    \"\"\"\n",
"    A class that models multilayer Perceptron - Deep neural network.\n",
"    Input parameters:\n",
"    units: a list that defines the number of neurons for each hidden layer\n",
"    \"\"\"\n",
"    def __init__(self, units: list, learning_rate: float):\n",
"\n",
"       self.units = units\n",
"       self.learning_rate = learning_rate\n",
"       \n",
"       # initiate a graph \n",
"       self.g = tf.Graph()\n",
"       with self.g.as_default():\n",
"            # make a placeholders for input values and targets\n",
"            self.tf_x = tf.compat.v1.placeholder(\n",
"                tf.float32, shape=(None, 1), name=\"tf_x\"\n",
"            )\n",
"            self.tf_y = tf.compat.v1.placeholder(tf.float32, (None), name=\"tf_y\")\n",
"            self.build_model() # build a model\n",
"\n",
"    def make_fc_layer(\n",
"       self, input_tensor, n_output_units: int, activation_fn=None, name=\"\"\n",
"    ):\n",
"       \"A function that creates fully connected layer\"\n",
"       \n",
"       input_shape = input_tensor.get_shape().as_list()[1:] # get input shape of the input tensor\n",
"       n_input_units = np.prod(input_shape)\n",
"\n",
"       weights_shape = [n_input_units, n_output_units]\n",
"       \n",
"       # initialize weights for a layer, use uniform initialization\n",
"       weights = tf.Variable(\n",
"            tf.random.uniform(weights_shape, minval=-0.5, maxval=0.5),\n",
"            name=f\"{name}_fc_weights\",\n",
"            shape=weights_shape,\n",
"            trainable=True,\n",
"       )\n",
"       # initialze bias\n",
"       bias = tf.Variable(\n",
"            tf.zeros(shape=[n_output_units], name=f\"{name}_fc_bias\"), trainable=True\n",
"       )\n",
"       \n",
"       # make output tensor by multiplying input tensor and added weights and add bias\n",
"       layer = tf.matmul(input_tensor, weights)\n",
"       layer = tf.nn.bias_add(layer, bias, name=\"net_pre_activation\")\n",
"       \n",
"       if activation_fn is not None:\n",
"            # function activation ( add non-linearity to the model )\n",
"            layer = activation_fn(layer, name=\"activation\")\n",
"\n",
"       return layer\n",
"\n",
"    def build_model(self):\n",
"       \"\"\"\n",
"       A function that builds fully connected deep neural network\n",
"       \"\"\"\n",
"\n",
"       self.layers = [self.tf_x] # list of all layers\n",
"       input_tensor = self.tf_x\n",
"       \n",
"       for i, unit in enumerate(self.units):\n",
"            if i + 1 == len(self.units):\n",
"                fc_layer = self.make_fc_layer(input_tensor, 1, name=\"output_layer\")\n",
"                self.layers.append(fc_layer)\n",
"                continue\n",
"\n",
"            fc_layer = self.make_fc_layer(\n",
"                input_tensor, unit...
\$150.00 for this solution

PayPal, G Pay, ApplePay, Amazon Pay, and all major credit cards accepted.

### Find A Tutor

View available Computer Science - Other Tutors

Get College Homework Help.

Are you sure you don't want to upload any files?

Fast tutor response requires as much info as possible.

SUBMIT YOUR HOMEWORK
We couldn't find that subject.
Please select the best match from the list below.

We'll send you an email right away. If it's not in your inbox, check your spam folder.

• 1
• 2
• 3
Live Chats