Download the notebook here

Exercise 10 (solution)#

In this exercise we learn about simple RNNs as well as encoder-decoder RNNs. We only implement some components from scratch in numpy and leave out the training. By now, it would not be hard for you to implement this in torch and add the training.

[ ]:
import numpy as np
from scipy.special import softmax
from dataclasses import dataclass

Task 1: Tokenization and embeddings#

Implement a simple character level tokenization and embedding algorithm. In contrast to what we did in an earlier lecture, we want to minimize the vocab size to just the characters that are present in a given text.

  1. Write a function called get_vocabulary(text) that returns a sorted list of all characters that occur in the text

  2. Write a function called tokenize(text, vocabulary) that takes a text and list of characters and returns a list of ints.

  3. Write a function called embed(tokens, vocab_size) that returns a numpy array of shape (n_tokens, vocab_size) where each row is a one-hot vector corresponding to a token

  4. Call all the functions and to create in_embeddings for our text

[ ]:
text = "hello"
[ ]:
def get_vocabulary(text):
    """Get a minimal character level vocabulary to tokenize the text."""
    text = text.lower()
    characters = sorted(set(text))
    return characters


vocabulary = get_vocabulary(text)
vocab_size = len(vocabulary)
vocabulary
[ ]:
def tokenize(text, vocabulary):
    """Tokenize the text, given the vocabulary."""
    text = text.lower()
    token_dict = {character: pos for pos, character in enumerate(vocabulary)}
    out = [token_dict[character] for character in text]
    return out


tokens = tokenize(text, vocabulary)
tokens
[ ]:
def embed(tokens, vocab_size):
    """Create input embeddings for each token."""
    out = np.zeros((len(tokens), vocab_size))
    out[np.arange(len(out)), tokens] = 1
    return out


in_embeddings = embed(tokens, vocab_size)
in_embeddings

Task 2: A Params class#

  1. Define a dataclass called Params that has the three attributes w_xh, w_hh, w_hy

  2. Create an instance of Params with weight matrices that have the correct shapes and are filled with uniform random values between -1 and 1.

[ ]:
n_in = vocab_size
n_out = vocab_size
n_hidden = 3
[ ]:
@dataclass
class Params:
    w_xh: np.ndarray
    w_hh: np.ndarray
    w_hy: np.ndarray
[ ]:
np.random.seed(12345)

p = Params(
    w_xh=np.random.uniform(size=(n_hidden, n_in)),
    w_hh=np.random.uniform(size=(n_hidden, n_hidden)),
    w_hy=np.random.uniform(size=(n_out, n_hidden)),
)
p

Task 3: Implement a Vanilla RNN (for Language Modelling)#

  1. Implement a function called model_step(x, h, p) where x is a one-hot vector, h is a vector that holds the internal state of the RNN and p is an instance of Params

  2. Implement a function called model(embeddings, p) that calles the model_step internally and produces an array of logits. The output array has shape (len(embeddings) -1, vocab_size). The function does roughly the following steps:

    • Initialize h to a vector of zeros

    • call model in a loop

    • Collect all y in a list

[ ]:
def model_step(x, h, p):
    h = np.tanh(p.w_xh @ x + p.w_hh @ h)
    y = p.w_hy @ h
    return h, y
[ ]:
def model(embeddings, p):
    """Model that takes input_embeddings and produces logits."""
    h = np.zeros(len(p.w_hh))
    out = []
    for x in embeddings[:-1]:
        h, y = model_step(x, h, p)
        out.append(y)
    return np.array(out)
[ ]:
logits = model(in_embeddings, p)
logits.shape
[ ]:
softmax(logits, axis=1).round(1)

Task 4: Implement loss function#

  1. Create a list called targets that contains the target token for each output. I.e. the tokenized version of "ello"

  2. Write a function called cross_entropy_loss(logits, targets). This is basically the same function you wrote in lecture 8. The steps are roughly:

    • Take the softmax over the last axis

    • Use the indexing trick to get likelihoods

    • Return the negative mean of the log likelihoods

We are not using the loss function for training, I just want to make sure you understand what is the loss function for language modelling.

[ ]:
targets = tokens[1:]
targets
[ ]:
def cross_entropy_loss(logits, targets):
    probs = softmax(logits, axis=1)
    likelihoods = probs[np.arange(len(targets)), targets]
    return -np.log(likelihoods + 1e-50).mean()
[ ]:
cross_entropy_loss(logits, targets)

Task 5: Implement a text-to-text model and use optimal weights#

In this task I give you trained weights for the model. Those weights should enable the model to correctly return "ello" when prompted with "hello"

The only think you need to do is:

  1. Write a function called s2s_model(text, p, vocabulary) that takes text and returns text. Inside, you have to do the following steps:

    • tokenize the text

    • embed the text

    • use the model to get logits

    • Get predicted tokens from the logits

    • Translate the tokens into text

[ ]:
w_xh_opt = np.array(
    [
        [-13.8, 0.6, 2.7, 0.1],
        [4.7, -20.9, 1.6, 0.1],
        [1.6, 6.9, 10.9, 0.0],
    ]
)

w_hh_opt = np.array(
    [
        [-2.1, -5.9, 7.2],
        [-5.9, -4.2, 0.8],
        [6.0, 7.5, 2.8],
    ]
)

w_hy_opt = np.array(
    [[-0.6, -24.2, -0.7], [3.4, 8.8, -12.0], [-12.5, 12.2, 9.0], [10.0, 3.2, 3.7]]
)

p_opt = Params(
    w_xh=w_xh_opt,
    w_hh=w_hh_opt,
    w_hy=w_hy_opt,
)
[ ]:
def s2s_model(text, p, vocabulary):
    """Model that takes text and returns text."""
    vocab_size = len(vocabulary)
    tokens = tokenize(text, vocabulary)
    input_embeddings = embed(tokens, vocab_size)
    logits = model(input_embeddings, p)
    predictions = np.argmax(logits, axis=1)
    return "".join(vocabulary[pred] for pred in predictions)


s2s_model(text, p_opt, vocabulary)

Switching to word level embedding for machine translation#

To learn about encoder-decoder RNNs we switch from character-level tokenization to word-level tokenization. Moreover, we add a start and end token.

Since you already know how to write tokenizers, here is the code:

[ ]:
in_text = "Hello World"
out_text = "Hallo Welt"


def get_vocabulary(text):
    """Get a minimal vocabulary to tokenize the text."""
    text = text.lower().split()
    words = sorted(set(text)) + ["<SOS>", "<EOS>"]
    return words


def tokenize(text, vocabulary):
    """Tokenize the text, given the vocabulary."""
    text = ["<SOS>"] + text.lower().split() + ["<EOS>"]
    token_dict = {character: pos for pos, character in enumerate(vocabulary)}
    out = [token_dict[character] for character in text]
    return out


def embed(tokens, vocab_size):
    """Create input embeddings for each token."""
    out = np.zeros((len(tokens), vocab_size))
    out[np.arange(len(out)), tokens] = 1
    return out


in_vocabulary = get_vocabulary(in_text)
print("Input vocabulary:", in_vocabulary)
in_vocab_size = len(in_vocabulary)
in_tokens = tokenize(in_text, in_vocabulary)
print("Input tokens:", in_tokens)
in_embeddings = embed(in_tokens, in_vocab_size)

out_vocabulary = get_vocabulary(out_text)
print("Output vocabulary:", out_vocabulary)
out_vocab_size = len(out_vocabulary)
out_tokens = tokenize(out_text, out_vocabulary)
print("Output tokens:", out_tokens)
target_size = len(out_tokens)
print("Target size:", target_size)


n_in = in_vocab_size
n_out = out_vocab_size
n_hidden = 4

Moreover, you get code for two classes of Parameters you can use in your model

[ ]:
@dataclass
class EncoderParams:
    w_xh: np.ndarray
    w_hh: np.ndarray


@dataclass
class DecoderParams:
    w_ss: np.ndarray
    w_ys: np.ndarray
    w_sy: np.ndarray


np.random.seed(1234)

p_enc = EncoderParams(
    w_xh=np.random.uniform(size=(n_hidden, n_in)),
    w_hh=np.random.uniform(size=(n_hidden, n_hidden)),
)

p_dec = DecoderParams(
    w_ss=np.random.uniform(size=(n_hidden, n_hidden)),
    w_ys=np.random.uniform(size=(n_hidden, n_out)),
    w_sy=np.random.uniform(size=(n_out, n_hidden)),
)

p_enc

Task 6: Implement encode and decode steps (for Machine Translation)#

  1. Write a function called `encode_step(x, h, p_enc)

  2. Write a function called `decode_step(s, y_prev, p_dec)

The two functions together will play the same role as the model_step in the simple RNN

[ ]:
def encode_step(x, h, p_enc):
    h = np.tanh(p_enc.w_xh @ x + p_enc.w_hh @ h)
    return h
[ ]:
def decode_step(s, y_prev, p_dec):
    s = np.tanh(p_dec.w_ss @ s + p_dec.w_ys @ y_prev)
    y = p_dec.w_sy @ s
    return s, y

Task 7: Implement the encoder-decoder model#

  1. Write a function called model(in_embeddings, target_size, p_enc, p_dec). The function has the following steps:

    • Initialize h as a vector of zeros

    • call the encode step in a loop to produce a final encoder state (h)

    • Rename h to s

    • Initialize y_prev to the embedding of the <SOS> token in the output vocabulary

    • Collect the ys in a list

[ ]:
def model(in_embeddings, target_size, p_enc, p_dec):
    h = np.zeros(len(p_enc.w_hh))
    for x in in_embeddings:
        h = encode_step(x, h, p_enc)

    s = h
    y_prev = np.zeros(p_dec.w_ys.shape[1])
    y_prev[-2] = 1
    out = []
    for _ in range(target_size):
        s, y = decode_step(s, y_prev, p_dec)
        out.append(y)
        y_prev = y

    return np.array(out)

Task 8: Implement the encoder-decoder text-to-text model#

  1. Implement a function called `s2s_model(in_text, p_enc, p_dec, in_vocabulary, out_vocabulary, target_size). This is similar to the function you wrote above, but this time the input vocabulary and output vocabulary differ.

[ ]:
def s2s_model(
    in_text,
    p_enc,
    p_dec,
    in_vocabulary=in_vocabulary,
    out_vocabulary=out_vocabulary,
    target_size=target_size,
):
    """Model that takes text and returns text."""
    in_vocab_size = len(in_vocabulary)
    in_tokens = tokenize(in_text, in_vocabulary)
    in_embeddings = embed(in_tokens, in_vocab_size)

    logits = model(in_embeddings, target_size, p_enc, p_dec)

    predictions = np.argmax(logits, axis=1)
    return " ".join(out_vocabulary[pred] for pred in predictions)
[ ]:
w_xh_opt = np.array(
    [
        [1.1, 0.2, -0.4, 0.5],
        [0.7, -0.5, -0.6, -7.9],
        [0.4, 3.9, 0.3, -0.2],
        [1.2, 0.6, 0.1, 2.7],
    ]
)

w_hh_opt = np.array(
    [
        [-0.6, -0.8, 0.4, -0.1],
        [-1.6, 2.7, -3.7, -3.8],
        [-1.2, 1.3, 1.0, 0.2],
        [-0.0, -0.6, 0.6, 0.9],
    ]
)

w_ss_opt = np.array(
    [
        [10.6, -0.5, -5.2, 0.7],
        [-6.7, 1.9, -5.0, 1.8],
        [10.9, -7.5, -6.2, 4.5],
        [1.5, 0.4, 4.1, -4.4],
    ]
)

w_sy_opt = np.array(
    [
        [3.0, -10.6, -22.2, 4.2],
        [4.4, 15.4, 9.6, -7.0],
        [4.8, -4.0, 12.2, -6.0],
        [-16.4, -10.1, -18.8, 14.1],
    ]
)


w_ys_opt = np.array(
    [
        [0.5, -3.0, 11.9, 4.2],
        [3.0, 0.8, 2.0, 1.0],
        [3.8, -1.1, 16.1, 8.3],
        [-3.2, -0.1, -7.4, -0.7],
    ]
)

p_enc_opt = EncoderParams(
    w_xh=w_xh_opt,
    w_hh=w_hh_opt,
)

p_dec_opt = DecoderParams(
    w_ss=w_ss_opt,
    w_ys=w_ys_opt,
    w_sy=w_sy_opt,
)
[ ]:
s2s_model(in_text, p_enc_opt, p_dec_opt)