## Transcribed Text

Workshop Task 1
deep neural network
1. Recall Lecture 6. Run the following 3 networks, observe their
behaviours, and compare in terms of (i) architecture, (ii) accuracy (both on
the training and testing datasets), (iii) training time, and (iv) testing time.
• net = network2.Network([784, 30, 10])
• net = network2.Network([784, 30, 30, 10])
• net = network2.Network([784, 30, 30, 30, 10])
• net.SGD(training_data, 30, 10, 0.1,
lmbda=5.0, evaluation_data=validation_data,
monitor_evaluation_accuracy=True)
Why are deep neural networks hard to train?
• Top six neurons in the two hidden layers network
([784,30,30,10]) are shown.
• A little bar is representing how quickly individual neuron is
changing as the network learns.
• Different layers in our deep network are learning at
vastly different speeds
• The later layers may be learning well, but not early
layers?
• With the network [784,30,30,30,10], respective speeds
of learning turn out to be 0.012, 0.060, and 0.283. How?
• With another layer, the respective speeds of learning are
0.003, 0.017, 0.070, and 0.285. How?
• Vanishing gradient problem
• Exploding gradient problem
Why are deep neural networks hard to train?
Why are deep neural networks hard to train?
Why the vanishing gradient problem occur
• Consider the simplest DNN with just a single neuron in each layer.
• Let’s write gradient ∂C/∂a1 associated to the first hidden neuron
Note that this expression is a product of the form wjσ′(zj).
a1 a2 a3 a4
W2 W3 W4
C
𝜎 …… eq (1) ′ Z_1 X W2 X 𝜎
′ Z_2 X W3 X 𝜎
′ Z_3 X W4 X 𝜎
′ Z_4 X ∂C/∂a4
• To understand why the vanishing gradient problem occurs, let's look at a plot of
the function σ′ (shown below)
• If weights in the network are initialized using a Gaussian with mean 0 and
standard deviation 1, it satisfies |wj|<1 and |wjσ′(zj
)|<1/4. How?
• Now let’s compare ∂C/∂a1 with ∂C/∂a3
∂C/∂a1
∂C/∂a3
Why the vanishing gradient problem occur
• Other issues include:
• Choice of activation function
• Weights initialization
• Gradient descent implementation
• And, of course, choice of network architecture and other hyper-parameters
we discussed in previous lectures.
Why are deep neural networks hard to train?
Deep Convolutional Neural Networks
• Convolutional neural networks use three basic concepts:
1. Local receptive fields (convolution/filtering operation)
2. Shared weights
3. Pooling
Deep Convolutional Neural Networks
Deep Convolutional Neural Networks
Filter sliding (or convolving) around the image
• Convolution layer:
• Convolution layer:
What is a convolution operation?
Deep Convolutional Neural Networks
28
28
Neuron in the hidden
layer is detecting a
particular feature e.g. a
vertical edge
Convolution can be seen as a1=σ(b+w∗a0), where a1 denotes
the set of output activations from one feature map, a0 is the
set of input activations, and ∗ is called a convolution operation.
Remember this?
→ a1=σ(b+w∗a0)
Receptive field:
• Make connections in small, localized regions of the
input image.
• Each neuron connects to a small region of the input
neurons, e.g. a 5×5 region
• This region is called the local receptive field (LRF) for
the hidden neuron
• Each connection learns a weight
• Slide the LRF over by one pixel to the right
• With a 28×28 input and 5×5 LRFs, we have 24×24
neurons in the hidden layer.
• Convolution layer:
Think of these filters acts as feature detectors from the original input
image or like a membrane that allows only the desired qualities of the
input to pass through it
Deep Convolutional Neural Networks
Neurons in the hidden
layer are detecting a
particular feature e.g. a
vertical edge
3 feature maps: Each feature map is
defined by a set of 5×5 shared weights,
and a single shared bias. The network can
detect 3 different kinds of features.
Think of these filters acts as feature detectors from the original input
image or like a membrane that allows only the desired qualities of the
input to pass through it
Deep Convolutional Neural Networks
CNN with 20 feature maps.
5×5 weights in the local receptive field.
Deep Convolutional Neural Networks
Not the learnt spatial structures
Learned features from a Convolutional Deep Belief Network.
Lee, Honglak, et al. "Convolutional deep belief networks for scalable
unsupervised learning of hierarchical representations." Proceedings of the
26th annual international conference on machine learning. 2009.
Deep Convolutional Neural Networks
b. Shared weights and biases:
• We use the same weights and bias for each of the 24×24 hidden neurons i.e. a
5×5 array of shared weights. For example, the j, kth hidden neuron output is
given as:
• All the neurons in the first hidden layer try to detect exactly the same feature
e.g. a vertical edge
• This map from the input layer to the hidden layer, called a feature map
• Shared weights and bias are often called kernel or filter.
Open this expression for neuron 1,1
Deep Convolutional Neural Networks
A big advantage of sharing weights and biases
• Greatly reduces the number of parameters involved in a network.
• With 20 feature maps, the convolutional layer is easily defined using a total of
20×26=520 parameters
• In contrast, a fully connected first layer, with 784 inputs and 30 hidden neurons is
defined using a total of 784×30 weights, plus an extra 30 biases i.e. 23,550
parameters.
• More than 40 times as many parameters as the convolutional layer.
Deep Convolutional Neural Networks
c. Pooling layers:
• A pooling layer takes each feature map and outputs a summarised
feature map.
• E.g. summarize a region of 2×2 neurons in the previous layer
24×24 neurons
Max-pooling 12×12 neurons
Deep Convolutional Neural Networks
28×28 input neurons
encode the pixel intensities
convolutional layer using a
5×5 local receptive field and 3
feature maps
max-pooling
Deep Convolutional Neural Networks
Convolutional Neural Network trained on the MNIST Database of handwritten digits
Visualizing Convolutional Neural Networks
- 32 x 32 image
- Convolution Layer 1: six unique 5
× 5 (stride 1) filters. Note, six
different filters produces a
feature map of depth six.
- 2 × 2 max pooling
- Convolution Layer 2: sixteen 5 ×
5 (stride 1) convolutional filters
- 2 × 2 max pooling
- Three FC layers (120, 100, 10)
• You are given a DCNN code (in Keras), applied to the MNIST dataset2
2.1. Explain:
• Pre-processing
• 2D CNN architecture, including convolutional, pooling, dropout, and flatten
layers
• The Comparison with the task-1 in terms of architecture, objective function,
and learning/training procedure
2.2. Apply the DCNN code to classify different objects from CIFAR-10
dataset
2.3. Bonus question: Recall BP equations from Lecture 6. How are the
equations of back propagation modified for CNN?
Workshop Task 2:
Deep Convolutional Neural Network
• You are given a DCNN code (in Keras), applied to the MNIST dataset2
2.1. Explain:
• Pre-processing
• 2D CNN architecture, including convolutional, pooling, dropout, and flatten
layers
• The Comparison with the task-1 in terms of architecture, objective function,
and learning/training procedure
2.2. Apply the DCNN code to classify different objects from CIFAR-10
dataset
2.3. Bonus question: Recall BP equations from Lecture 6. How are the
equations of backpropagation modified for CNN?
Workshop Task 2:
Deep Convolutional Neural Network
Workshop Task 2:
Deep Convolutional Neural Network

These solutions may offer step-by-step problem-solving explanations or good writing examples that include modern styles of formatting and construction
of bibliographies out of text citations and references. Students may use these solutions for personal skill-building and practice.
Unethical use is strictly forbidden.

'''Trains a simple convnet on the MNIST dataset.

Gets to 99.25% test accuracy after 12 epochs

(there is still a lot of margin for parameter tuning).

16 seconds per epoch on a GRID K520 GPU.

'''

from __future__ import print_function

import keras

from keras.datasets import mnist

from keras.models import Sequential

from keras.layers import Dense, Dropout, Flatten

from keras.layers import Conv2D, MaxPooling2D

from keras import backend as K

batch_size = 128

num_classes = 10

epochs = 12

# input image dimensions

img_rows, img_cols = 28, 28

# the data, split between train and test sets

(x_train, y_train), (x_test, y_test) = mnist.load_data()

if K.image_data_format() == 'channels_first':

x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)

x_test = x_test.reshape(x_test.shape[0], 1, img_rows...