QuestionQuestion

Transcribed TextTranscribed Text

Deep Learning Homework 2 HW2 ● Video caption generation ○ Sequence-to-sequence model ○ Training Tips Video caption generation ● Introduction ● Sequence-to-sequence model ● Training Tips ○ Attention ○ Schedule Sampling ○ Beamsearch ● How to reach the baseline ? Introduction ● Video Caption Generation a. Input : A short video b. Output: The corresponding caption that depicts the video ● There are several difficulties including: a. Different attributes of video (object, action) b. Variable length of I/O Input Output “a man is playing a song on the piano” Sequence-to-sequence 1/5 ● Two recurrent neural networks (RNNs) an encoder that processes the input a decoder that generates the output encoder decoder <BOS> <EOS> Sequence-to-sequence 2/5 ● Data preprocess: ○ Dictionary - most frequently word or min count ○ other tokens:<PAD>, <BOS>, <EOS>, <UNK> - <PAD> :Pad the sentencen to the same length - <BOS> :Begin of sentence, a sign to generate the output sentence. - <EOS> :End of sentence, a sign of the end of the output sentence. - <UNK> :Use this token when the word isn’t in the dictionary or just ignore the unknown word. Sequence-to-sequence 3/5 ● Text Input: One-hot Vector encoding ( 1-to-N coding, N is the size of the vocabulary in dictionary ) ○ e.g. - neural = [0, 0, 0, …, 1, 0, 0, …, 0, 0, 0] - network = [0, 0, 0, …, 0, 0, 1, …, 0, 0, 0] ● LSTM unit: cell output than project to a vocabulary-size vector Sequence-to-sequence - S2VT 4/5 ● Sequence-to-Sequence Based Model: S2VT Sequence-to-sequence - S2VT 5/5 ● Sequence-to-Sequence Based Model:S2VT - Two layer LSTM structure Training Tips - Attention 1/3 ● Attention on encoder hidden states : ○ Allow model to peek at different sections of inputs at each decoding time step Training Tips - Schedule Sampling 2/3 ● Schedule Sampling: ○ To solve “exposure bias” problem, When training, we feed (groundtruth) or (last time step’s output) as input at odds Training Tips - Beam search 3/3 ● Beam search: ○ keep a fixed number of paths How to reach the baseline ? 1/2 ● Evaluation:BLEU@1 ○ Precision = correct words / candidate length ○ ○ BLEU@1 = BP * Precision ○ e.g.: Ground Truth :a man is mowing a lawn Prediction :a man is riding a man on a woman is riding a motorcycle BLEU:1 * 4/13 = 0.308 where c = candidate length, r = reference length How to reach the baseline ? 2/2 ● Baseline: BLEU@1 = 0.6 (Captions Avg.) ● baseline model: - Training Epoch = 200 - AdamOptimizer - LSTM dimension = 256 - Training time = 72 mins, using 960 TX - Learning rate = 0.001 - vocab size = min count > 3 Data & format ● Dataset: ○ MSVD - 1450 videos for training - 100 videos for testing ● Format: ○ Download MLDS_hw2_1_data.tar.gz ○ Google will remind “Google Drive can't scan this file for viruses”

Solution PreviewSolution Preview

These solutions may offer step-by-step problem-solving explanations or good writing examples that include modern styles of formatting and construction of bibliographies out of text citations and references. Students may use these solutions for personal skill-building and practice. Unethical use is strictly forbidden.

import math
import operator
import sys
import json
from functools import reduce
def count_ngram(candidate, references, n):
    clipped_count = 0
    count = 0
    r = 0
    c = 0
    for si in range(len(candidate)):
       # Calculate precision for each sentence
       ref_counts = []
       ref_lengths = []
       # Build dictionary of ngram counts
       for reference in references:
            ref_sentence = reference[si]
            ngram_d = {}
            words = ref_sentence.strip().split()
            ref_lengths.append(len(words))
            limits = len(words) - n + 1
            # loop through the sentance consider the ngram length
            for i in range(limits):
                ngram = ' '.join(words[i:i+n]).lower()
                if ngram in ngram_d.keys():
                   ngram_d[ngram] += 1
                else:
                   ngram_d[ngram] = 1
            ref_counts.append(ngram_d)
       # candidate
       cand_sentence = candidate[si]
       cand_dict = {}
       words = cand_sentence.strip().split()
       limits = len(words) - n + 1
       for i in range(0, limits):
            ngram = ' '.join(words[i:i + n]).lower()
            if ngram in cand_dict:
                cand_dict[ngram] += 1
            else:
                cand_dict[ngram] = 1
       clipped_count += clip_count(cand_dict, ref_counts)
       count += limits
       r += best_length_match(ref_lengths, len(words))
       c += len(words)
    if clipped_count == 0:
       pr = 0
    else:
       pr = float(clipped_count) / count
    bp = brevity_penalty(c, r)
    return pr, bp...

By purchasing this solution you'll be able to access the following files:
Solution1.docx and Solution2.zip.

$150.00
for this solution

or FREE if you
register a new account!

PayPal, G Pay, ApplePay, Amazon Pay, and all major credit cards accepted.

Find A Tutor

View available Computer Science - Other Tutors

Get College Homework Help.

Are you sure you don't want to upload any files?

Fast tutor response requires as much info as possible.

Decision:
Upload a file
Continue without uploading

SUBMIT YOUR HOMEWORK
We couldn't find that subject.
Please select the best match from the list below.

We'll send you an email right away. If it's not in your inbox, check your spam folder.

  • 1
  • 2
  • 3
Live Chats