QuestionQuestion

Transcribed TextTranscribed Text

Workshop Task 1 deep neural network 1. Recall Lecture 6. Run the following 3 networks, observe their behaviours, and compare in terms of (i) architecture, (ii) accuracy (both on the training and testing datasets), (iii) training time, and (iv) testing time. • net = network2.Network([784, 30, 10]) • net = network2.Network([784, 30, 30, 10]) • net = network2.Network([784, 30, 30, 30, 10]) • net.SGD(training_data, 30, 10, 0.1, lmbda=5.0, evaluation_data=validation_data, monitor_evaluation_accuracy=True) Why are deep neural networks hard to train? • Top six neurons in the two hidden layers network ([784,30,30,10]) are shown. • A little bar is representing how quickly individual neuron is changing as the network learns. • Different layers in our deep network are learning at vastly different speeds • The later layers may be learning well, but not early layers? • With the network [784,30,30,30,10], respective speeds of learning turn out to be 0.012, 0.060, and 0.283. How? • With another layer, the respective speeds of learning are 0.003, 0.017, 0.070, and 0.285. How? • Vanishing gradient problem • Exploding gradient problem Why are deep neural networks hard to train? Why are deep neural networks hard to train? Why the vanishing gradient problem occur • Consider the simplest DNN with just a single neuron in each layer. • Let’s write gradient ∂C/∂a1 associated to the first hidden neuron Note that this expression is a product of the form wjσ′(zj). a1 a2 a3 a4 W2 W3 W4 C 𝜎 …… eq (1) ′ Z_1 X W2 X 𝜎 ′ Z_2 X W3 X 𝜎 ′ Z_3 X W4 X 𝜎 ′ Z_4 X ∂C/∂a4 • To understand why the vanishing gradient problem occurs, let's look at a plot of the function σ′ (shown below) • If weights in the network are initialized using a Gaussian with mean 0 and standard deviation 1, it satisfies |wj|<1 and |wjσ′(zj )|<1/4. How? • Now let’s compare ∂C/∂a1 with ∂C/∂a3 ∂C/∂a1 ∂C/∂a3 Why the vanishing gradient problem occur • Other issues include: • Choice of activation function • Weights initialization • Gradient descent implementation • And, of course, choice of network architecture and other hyper-parameters we discussed in previous lectures. Why are deep neural networks hard to train? Deep Convolutional Neural Networks • Convolutional neural networks use three basic concepts: 1. Local receptive fields (convolution/filtering operation) 2. Shared weights 3. Pooling Deep Convolutional Neural Networks Deep Convolutional Neural Networks Filter sliding (or convolving) around the image • Convolution layer: • Convolution layer: What is a convolution operation? Deep Convolutional Neural Networks 28 28 Neuron in the hidden layer is detecting a particular feature e.g. a vertical edge Convolution can be seen as a1=σ(b+w∗a0), where a1 denotes the set of output activations from one feature map, a0 is the set of input activations, and ∗ is called a convolution operation. Remember this? → a1=σ(b+w∗a0) Receptive field: • Make connections in small, localized regions of the input image. • Each neuron connects to a small region of the input neurons, e.g. a 5×5 region • This region is called the local receptive field (LRF) for the hidden neuron • Each connection learns a weight • Slide the LRF over by one pixel to the right • With a 28×28 input and 5×5 LRFs, we have 24×24 neurons in the hidden layer. • Convolution layer: Think of these filters acts as feature detectors from the original input image or like a membrane that allows only the desired qualities of the input to pass through it Deep Convolutional Neural Networks Neurons in the hidden layer are detecting a particular feature e.g. a vertical edge 3 feature maps: Each feature map is defined by a set of 5×5 shared weights, and a single shared bias. The network can detect 3 different kinds of features. Think of these filters acts as feature detectors from the original input image or like a membrane that allows only the desired qualities of the input to pass through it Deep Convolutional Neural Networks CNN with 20 feature maps. 5×5 weights in the local receptive field. Deep Convolutional Neural Networks Not the learnt spatial structures Learned features from a Convolutional Deep Belief Network. Lee, Honglak, et al. "Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations." Proceedings of the 26th annual international conference on machine learning. 2009. Deep Convolutional Neural Networks b. Shared weights and biases: • We use the same weights and bias for each of the 24×24 hidden neurons i.e. a 5×5 array of shared weights. For example, the j, kth hidden neuron output is given as: • All the neurons in the first hidden layer try to detect exactly the same feature e.g. a vertical edge • This map from the input layer to the hidden layer, called a feature map • Shared weights and bias are often called kernel or filter. Open this expression for neuron 1,1 Deep Convolutional Neural Networks A big advantage of sharing weights and biases • Greatly reduces the number of parameters involved in a network. • With 20 feature maps, the convolutional layer is easily defined using a total of 20×26=520 parameters • In contrast, a fully connected first layer, with 784 inputs and 30 hidden neurons is defined using a total of 784×30 weights, plus an extra 30 biases i.e. 23,550 parameters. • More than 40 times as many parameters as the convolutional layer. Deep Convolutional Neural Networks c. Pooling layers: • A pooling layer takes each feature map and outputs a summarised feature map. • E.g. summarize a region of 2×2 neurons in the previous layer 24×24 neurons Max-pooling 12×12 neurons Deep Convolutional Neural Networks 28×28 input neurons encode the pixel intensities convolutional layer using a 5×5 local receptive field and 3 feature maps max-pooling Deep Convolutional Neural Networks Convolutional Neural Network trained on the MNIST Database of handwritten digits Visualizing Convolutional Neural Networks - 32 x 32 image - Convolution Layer 1: six unique 5 × 5 (stride 1) filters. Note, six different filters produces a feature map of depth six. - 2 × 2 max pooling - Convolution Layer 2: sixteen 5 × 5 (stride 1) convolutional filters - 2 × 2 max pooling - Three FC layers (120, 100, 10) • You are given a DCNN code (in Keras), applied to the MNIST dataset2 2.1. Explain: • Pre-processing • 2D CNN architecture, including convolutional, pooling, dropout, and flatten layers • The Comparison with the task-1 in terms of architecture, objective function, and learning/training procedure 2.2. Apply the DCNN code to classify different objects from CIFAR-10 dataset 2.3. Bonus question: Recall BP equations from Lecture 6. How are the equations of back propagation modified for CNN? Workshop Task 2: Deep Convolutional Neural Network • You are given a DCNN code (in Keras), applied to the MNIST dataset2 2.1. Explain: • Pre-processing • 2D CNN architecture, including convolutional, pooling, dropout, and flatten layers • The Comparison with the task-1 in terms of architecture, objective function, and learning/training procedure 2.2. Apply the DCNN code to classify different objects from CIFAR-10 dataset 2.3. Bonus question: Recall BP equations from Lecture 6. How are the equations of backpropagation modified for CNN? Workshop Task 2: Deep Convolutional Neural Network Workshop Task 2: Deep Convolutional Neural Network

Solution PreviewSolution Preview

These solutions may offer step-by-step problem-solving explanations or good writing examples that include modern styles of formatting and construction of bibliographies out of text citations and references. Students may use these solutions for personal skill-building and practice. Unethical use is strictly forbidden.

'''Trains a simple convnet on the MNIST dataset.

Gets to 99.25% test accuracy after 12 epochs
(there is still a lot of margin for parameter tuning).
16 seconds per epoch on a GRID K520 GPU.
'''

from __future__ import print_function
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K

batch_size = 128
num_classes = 10
epochs = 12

# input image dimensions
img_rows, img_cols = 28, 28

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows...

By purchasing this solution you'll be able to access the following files:
CNN_MNIST.py, CNN_cifar10.py and Solution.docx.

$75.00
for this solution

or FREE if you
register a new account!

PayPal, G Pay, ApplePay, Amazon Pay, and all major credit cards accepted.

Find A Tutor

View available Artificial Intelligence Tutors

Get College Homework Help.

Are you sure you don't want to upload any files?

Fast tutor response requires as much info as possible.

Decision:
Upload a file
Continue without uploading

SUBMIT YOUR HOMEWORK
We couldn't find that subject.
Please select the best match from the list below.

We'll send you an email right away. If it's not in your inbox, check your spam folder.

  • 1
  • 2
  • 3
Live Chats