July 25, 2021

AIM6 - Generating Sequences

Motivation for sequential models

In this essay, closer look into the unit design, how can this be used for data generation.

Example 1: Music Generation

Example of a possible model with two layers of LSTMs. We prepared our dataset as sequences of 40 frames predicting the next 1 frame (“many to one”), the model learns the transformation of x-> y.

Check the generated music sample: https://soundcloud.com/previtus/ml-jazz-meanderings-ml-generated-sounds-1/s-DCZbx

Example 2: Stock Market

There are similarities between the “next value prediction” task and a generative task (given the way how we use these models).

Remember this when you are using them for your creative projects -> we might want irregularities, model breaking, training on multiple sources…etc.

The less common, even worse outcome.

Vanilla RNN

As math formulas:

ℎ𝑡 = 𝑡𝑎𝑛ℎ(𝑊𝑥ℎ ∗ 𝑥𝑡 + 𝑊ℎℎ ∗ ℎ𝑡−1)

y𝑡 = 𝑊𝑦ℎ ∗ℎ𝑡

1
2
3
4
5
6
class RNN:
def step(self, x):
# update the hidden state
self.h = np.tanh(np.dot(self.w_hh, self.h) + np.dot(self.W_xh, x))
# compute the output vector
return np.dot(self.W_hy, self.h)

Forgetting information

Which has one weakness -ℎ𝑡 carries information to be stored in the memory and aloso for output…

Long Short Term Memory Unit (LSTM)

This will be a rough animation of data passing through LSTM. Kepp in mind that we do not really care about each detail at this point - rather we wwant to see why is it better at remembering long-term dependencies than the vanilla RNN…

Main idea: independent channel to carry long-term memory.

Forget gate influences what we delete from memory.

Input gate selects what to read from input.

Then we add it to the memory.

Output gate combines everything together.

These operations are inlfuenced by learned parameters (W, U, b).

In addition to the basic RNN we have a special channel for long-term mem.

In-depth look: RNNs and LSTMs

Recurrent Neural Networks (RNN): Simpler unit design

Long-Short Term Memory (LSTM): More complex unit design, made to remember longer dependencies inside the data.

In code we set up the dimension of the data flowing through these models - that’s the size of the vectors h and c.

1
2
3
4
from tensorflow.keras import layers

model.add(Layers.RNN(128))
model.add(Layers.LSTM(128))
Loss functions

We want to measure the distance between two vector (typically this is betwenn predictions and labels):

Cross-Entropy loss

About this Post

This post is written by Siqi Shu, licensed under CC BY-NC 4.0.