Download the notebook here

Exercise 8#

[ ]:

from datasets import load_dataset
import torch
import matplotlib.pyplot as plt

Data Preparation#

The data preparation only uses concepts you already know from previous lectures. We therefore start with clean datasets for training and validation.

[ ]:

data = load_dataset("mnist")
data.set_format("torch")

[ ]:

example = data["test"][0]
print(f"True label: {int(example['label'])}")
fig = plt.imshow(example["image"])

[ ]:

img_size = example["image"].numel()
img_size

[ ]:

# Dividing by 255 maps pixel values to 0, 1
X_train = data["train"]["image"][:].reshape(-1, img_size).to(torch.float) / 255
X_test = data["test"]["image"][:].reshape(-1, img_size).to(torch.float) / 255
y_train = data["train"]["label"]
y_test = data["test"]["label"]
X_train.shape

Dimensions of our Neural Network#

[ ]:

# the input dimension
n_in = img_size
# the dimension of our 2 hidden layers
n_hidden = 16
# the dimension of our output layer
n_out = 10

Task 1: How many Parameters?#

The number of trainable parameters is entirly determined by the number of layers and their dimensions.

Write a function called count_params(n_in, n_hidden, n_out) that counts how many parameters will be in our model. Assume that there are 2 hidden layers.

[ ]:

Task 2: Set up random start parameters#

We want to draw random start parameters that are distributed uniformly between -0.5 and 0.5.

Since we are going to modify the parameters in-place while training the model, we need a way to freshly generate the start parameters multiple times. We therefore create a function that draws start parameters.

The function takes the following arguments: - n_in - n_hidden - n_out - seed (give it a default value of 1995 so we all get the same results)

The function returns: - a list of weight matrices with the correct shapes - a list of biases with the correct shapes

[ ]:

Task 3: Implement relu and softmax#

Implement a relu function that takes a 1d tensor and applies the relu nonlinearity elementwise
Implement a softmax function that takes a 1d tensor of logits and returns a 1d tensor of probabilities
Test your function on a small tensor
If you have time implement other nonlinearities such as sigmoid, tanh, …

[ ]:

[ ]:

[ ]:

Task 4: Implement the model#

The model should take the following arguments: - x: A 1d tensor with a flattened image - weights: The list of weights from task 2 - biases: The list of biases from task 2

It should return a 1d tensor of length n_out that contains probabilities for each category.

Implement a model function
Try it out on the first element of the training data
Try out the batch_model function on the first few rows of the training data

[ ]:

[ ]:

In the training process we need a batch_model function that evaluates the model on a batch of data. This is not very instructional, so I give you the function right away.

[ ]:

def batch_model(batch, weights, biases):
    n_out = len(biases[-1])
    out = torch.zeros((len(batch), n_out))
    for i, x in enumerate(batch):
        out[i] = model(x, weights, biases)
    return out

Task 5: Implement loss functions#

Write a function called nnl_loss that takes the result of the batch_model and returns the average negative log likelihood.
Try it out on the first 100 rows of the training data
Implement an accuracy function that takes the same arguments as the loss function
Try it out on the first 100 rows of the training data

[ ]:

[ ]:

[ ]:

[ ]:

Task 6: The training loop#

Create fresh weights and biases
Set requires_grad to True for all tensors in the weights and biases list.
Write a training loop to train your model with SGD and the following hyper-parameters
- n_epochs: 2
- batch_size: 100,
- learning_rate: 0.001
If you have time, try the model out on a few images

Important: Do the entire training in just one cell and re-create the start parameters at the beginning of that cell, so each training run starts from the same position.

[ ]:

# create fresh random weights and biases

# set requires_grad to True for training

# define the hyperparameters

# loop over epochs

# loop over batches
# evaluate model
# evaluate loss
# backwards

# loop over the paramter lists
# SGD updates for each parameter tensor

# Zero the gradients for the next iteration

[ ]:

[ ]:

[ ]:

Task 7: Diagnostics#

Copy-paste the training loop from the previous task or work in the same cell as before.
After each epoch, evaluate the batch_model on test data with the current best parameters; Use torch.no_grad to disable gradients.
Calculate the accuracy score no the result and print it.

[ ]:

Task 8: Training the model#

Tweak the number of epochs, batch size and learning rate until you get an accuracy of at least 90 %

Copy paste the code from the previous task or work in the same cell.

[ ]: