Download the notebook here
Exercise 8#
[ ]:
from datasets import load_dataset
import torch
import matplotlib.pyplot as plt
Data Preparation#
The data preparation only uses concepts you already know from previous lectures. We therefore start with clean datasets for training and validation.
[ ]:
data = load_dataset("mnist")
data.set_format("torch")
[ ]:
example = data["test"][0]
print(f"True label: {int(example['label'])}")
fig = plt.imshow(example["image"])
[ ]:
img_size = example["image"].numel()
img_size
[ ]:
# Dividing by 255 maps pixel values to 0, 1
X_train = data["train"]["image"][:].reshape(-1, img_size).to(torch.float) / 255
X_test = data["test"]["image"][:].reshape(-1, img_size).to(torch.float) / 255
y_train = data["train"]["label"]
y_test = data["test"]["label"]
X_train.shape
Dimensions of our Neural Network#
[ ]:
# the input dimension
n_in = img_size
# the dimension of our 2 hidden layers
n_hidden = 16
# the dimension of our output layer
n_out = 10
Task 1: How many Parameters?#
The number of trainable parameters is entirly determined by the number of layers and their dimensions.
Write a function called count_params(n_in, n_hidden, n_out) that counts how many parameters will be in our model. Assume that there are 2 hidden layers.
[ ]:
Task 2: Set up random start parameters#
We want to draw random start parameters that are distributed uniformly between -0.5 and 0.5.
Since we are going to modify the parameters in-place while training the model, we need a way to freshly generate the start parameters multiple times. We therefore create a function that draws start parameters.
The function takes the following arguments: - n_in - n_hidden - n_out - seed (give it a default value of 1995 so we all get the same results)
The function returns: - a list of weight matrices with the correct shapes - a list of biases with the correct shapes
[ ]:
Task 3: Implement relu and softmax#
Implement a relu function that takes a 1d tensor and applies the relu nonlinearity elementwise
Implement a softmax function that takes a 1d tensor of logits and returns a 1d tensor of probabilities
Test your function on a small tensor
If you have time implement other nonlinearities such as sigmoid, tanh, …
[ ]:
[ ]:
[ ]:
Task 4: Implement the model#
The model should take the following arguments: - x: A 1d tensor with a flattened image - weights: The list of weights from task 2 - biases: The list of biases from task 2
It should return a 1d tensor of length n_out that contains probabilities for each category.
Implement a
modelfunctionTry it out on the first element of the training data
Try out the batch_model function on the first few rows of the training data
[ ]:
[ ]:
In the training process we need a batch_model function that evaluates the model on a batch of data. This is not very instructional, so I give you the function right away.
[ ]:
def batch_model(batch, weights, biases):
n_out = len(biases[-1])
out = torch.zeros((len(batch), n_out))
for i, x in enumerate(batch):
out[i] = model(x, weights, biases)
return out
Task 5: Implement loss functions#
Write a function called
nnl_lossthat takes the result of the batch_model and returns the average negative log likelihood.Try it out on the first 100 rows of the training data
Implement an
accuracyfunction that takes the same arguments as the loss functionTry it out on the first 100 rows of the training data
[ ]:
[ ]:
[ ]:
[ ]:
Task 6: The training loop#
Create fresh weights and biases
Set
requires_gradto True for all tensors in the weights and biases list.Write a training loop to train your model with SGD and the following hyper-parameters
n_epochs: 2
batch_size: 100,
learning_rate: 0.001
If you have time, try the model out on a few images
Important: Do the entire training in just one cell and re-create the start parameters at the beginning of that cell, so each training run starts from the same position.
[ ]:
# create fresh random weights and biases
# set requires_grad to True for training
# define the hyperparameters
# loop over epochs
# loop over batches
# evaluate model
# evaluate loss
# backwards
# loop over the paramter lists
# SGD updates for each parameter tensor
# Zero the gradients for the next iteration
[ ]:
[ ]:
[ ]:
Task 7: Diagnostics#
Copy-paste the training loop from the previous task or work in the same cell as before.
After each epoch, evaluate the batch_model on test data with the current best parameters; Use
torch.no_gradto disable gradients.Calculate the accuracy score no the result and print it.
[ ]:
Task 8: Training the model#
Tweak the number of epochs, batch size and learning rate until you get an accuracy of at least 90 %
Copy paste the code from the previous task or work in the same cell.
[ ]: