Download the notebook here
Exercise 8 (solution)#
[ ]:
from datasets import load_dataset
import torch
import matplotlib.pyplot as plt
Data Preparation#
The data preparation only uses concepts you already know from previous lectures. We therefore start with clean datasets for training and validation.
[ ]:
data = load_dataset("mnist")
data.set_format("torch")
[ ]:
example = data["test"][0]
print(f"True label: {int(example['label'])}")
fig = plt.imshow(example["image"])
[ ]:
img_size = example["image"].numel()
img_size
[ ]:
# Dividing by 255 maps pixel values to 0, 1
X_train = data["train"]["image"][:].reshape(-1, img_size).to(torch.float) / 255
X_test = data["test"]["image"][:].reshape(-1, img_size).to(torch.float) / 255
y_train = data["train"]["label"]
y_test = data["test"]["label"]
X_train.shape
Dimensions of our Neural Network#
[ ]:
# the input dimension
n_in = img_size
# the dimension of our 2 hidden layers
n_hidden = 16
# the dimension of our output layer
n_out = 10
Task 1: How many Parameters?#
The number of trainable parameters is entirly determined by the number of layers and their dimensions.
Write a function called count_params(n_in, n_hidden, n_out) that counts how many parameters will be in our model. Assume that there are 2 hidden layers.
[ ]:
def count_params(n_in, n_hidden, n_out):
n_weights = n_hidden * (n_in + n_hidden + n_out)
n_biases = 2 * n_hidden + n_out
return n_weights + n_biases
count_params(n_in, n_hidden, n_out)
Task 2: Set up random start parameters#
We want to draw random start parameters that are distributed uniformly between -0.5 and 0.5.
Since we are going to modify the parameters in-place while training the model, we need a way to freshly generate the start parameters multiple times. We therefore create a function that draws start parameters.
The function takes the following arguments: - n_in - n_hidden - n_out - seed (give it a default value of 1995 so we all get the same results)
The function returns: - a list of weight matrices with the correct shapes - a list of biases with the correct shapes
[ ]:
def create_params(n_in, n_hidden, n_out, seed=1995):
torch.manual_seed(1995)
weights = [
torch.rand((n_hidden, n_in)) - 0.5,
torch.rand((n_hidden, n_hidden)) - 0.5,
torch.rand((n_out, n_hidden)) - 0.5,
]
biases = [
torch.rand(n_hidden) - 0.5,
torch.rand(n_hidden) - 0.5,
torch.rand(n_out) - 0.5,
]
return weights, biases
weights, biases = create_params(n_in, n_hidden, n_out)
biases
Task 3: Implement relu and softmax#
Implement a relu function that takes a 1d tensor and applies the relu nonlinearity elementwise
Implement a softmax function that takes a 1d tensor of logits and returns a 1d tensor of probabilities
Test your function on a small tensor
If you have time implement other nonlinearities such as sigmoid, tanh, …
[ ]:
def relu(x):
"""Calculate the elementwise relu nonlinearity on x."""
return torch.clip(x, 0)
def softmax(x):
"""Compute softmax values over x.
Subtracting the max is optional but improves numerical stability
"""
e_x = torch.exp(x - torch.max(x))
return e_x / e_x.sum()
[ ]:
relu(torch.linspace(-1, 1, 5))
[ ]:
softmax(torch.tensor([-1, 3, -2]))
Task 4: Implement the model#
The model should take the following arguments: - x: A 1d tensor with a flattened image - weights: The list of weights from task 2 - biases: The list of biases from task 2
It should return a 1d tensor of length n_out that contains probabilities for each category.
Implement a
modelfunctionTry it out on the first element of the training data
Try out the batch_model function on the first few rows of the training data
[ ]:
def model(x, weights, biases):
h1 = relu(weights[0] @ x + biases[0])
h2 = relu(weights[1] @ h1 + biases[1])
return softmax(weights[2] @ h2 + biases[2])
[ ]:
model(X_train[0], weights, biases)
In the training process we need a batch_model function that evaluates the model on a batch of data. This is not very instructional, so I give you the function right away.
[ ]:
def batch_model(batch, weights, biases):
n_out = len(biases[-1])
out = torch.zeros((len(batch), n_out))
for i, x in enumerate(batch):
out[i] = model(x, weights, biases)
return out
Task 5: Implement loss functions#
Write a function called
nnl_lossthat takes the result of the batch_model and returns the average negative log likelihood.Try it out on the first 100 rows of the training data
Implement an
accuracyfunction that takes the same arguments as the loss functionTry it out on the first 100 rows of the training data
[ ]:
def nll_loss(probs, labels):
likelihoods = probs[torch.arange(len(probs)), labels] + 1e-50
loglikes = torch.log(likelihoods)
return -loglikes.mean()
[ ]:
probs = batch_model(X_train[:100], weights, biases)
labels = y_train[:100]
nll_loss(probs, labels)
[ ]:
def accuracy(probs, labels):
y_pred = probs.argmax(axis=1)
return (y_pred == labels).to(torch.float).mean()
[ ]:
accuracy(probs, labels)
Task 6: The training loop#
Create fresh weights and biases
Set
requires_gradto True for all tensors in the weights and biases list.Write a training loop to train your model with SGD and the following hyper-parameters
n_epochs: 2
batch_size: 100,
learning_rate: 0.001
If you have time, try the model out on a few images
Important: Do the entire training in just one cell and re-create the start parameters at the beginning of that cell, so each training run starts from the same position.
[ ]:
# create fresh random weights and biases
# set requires_grad to True for training
# define the hyperparameters
# loop over epochs
# loop over batches
# evaluate model
# evaluate loss
# backwards
# loop over the paramter lists
# SGD updates for each parameter tensor
# Zero the gradients for the next iteration
[ ]:
# create fresh random weights and biases
weights, biases = create_params(n_in, n_hidden, n_out)
# set requires_grad to True for training
for i in range(3):
weights[i].requires_grad = True
biases[i].requires_grad = True
# define the hyperparameters
n_epochs = 2
batch_size = 100
learning_rate = 0.01
# loop over epochs
for _epoch in range(n_epochs):
batch_indices = torch.randperm(len(X_train)).reshape(-1, batch_size)
# loop over batches
for idxs in batch_indices:
probs = batch_model(X_train[idxs], weights, biases)
loss = nll_loss(probs, y_train[idxs])
loss.backward()
for i in range(3):
# SGD updates for each parameter
weights[i].data = weights[i].data - learning_rate * weights[i].grad.data
biases[i].data = biases[i].data - learning_rate * biases[i].grad.data
# Zero the gradients for the next iteration
weights[i].grad.data.zero_()
biases[i].grad.data.zero_()
[ ]:
example_idx = 0
with torch.no_grad():
probs = model(X_test[example_idx], weights, biases)
probs
[ ]:
probs[y_test[example_idx]]
Task 7: Diagnostics#
Copy-paste the training loop from the previous task or work in the same cell as before.
After each epoch, evaluate the batch_model on test data with the current best parameters; Use
torch.no_gradto disable gradients.Calculate the accuracy score no the result and print it.
[ ]:
# create fresh random weights and biases
weights, biases = create_params(n_in, n_hidden, n_out)
# set requires_grad to True for training
for i in range(3):
weights[i].requires_grad = True
biases[i].requires_grad = True
# define the hyperparameters
n_epochs = 2
batch_size = 100
learning_rate = 0.01
# loop over epochs
for epoch in range(n_epochs):
batch_indices = torch.randperm(len(X_train)).reshape(-1, batch_size)
# loop over batches
for idxs in batch_indices:
probs = batch_model(X_train[idxs], weights, biases)
loss = nll_loss(probs, y_train[idxs])
loss.backward()
for i in range(3):
# SGD updates for each parameter
weights[i].data = weights[i].data - learning_rate * weights[i].grad.data
biases[i].data = biases[i].data - learning_rate * biases[i].grad.data
# Zero the gradients for the next iteration
weights[i].grad.data.zero_()
biases[i].grad.data.zero_()
with torch.no_grad():
probs = batch_model(X_test, weights, biases)
acc = accuracy(probs, y_test)
print(f"Accuracy after epoch {epoch}: {acc}")
Task 8: Training the model#
Tweak the number of epochs, batch size and learning rate until you get an accuracy of at least 90 %
Copy paste the code from the previous task or work in the same cell.
[ ]:
# create fresh random weights and biases
weights, biases = create_params(n_in, n_hidden, n_out)
# set requires_grad to True for training
for i in range(3):
weights[i].requires_grad = True
biases[i].requires_grad = True
# define the hyperparameters
n_epochs = 5
batch_size = 25
learning_rate = 0.1
# loop over epochs
for epoch in range(n_epochs):
batch_indices = torch.randperm(len(X_train)).reshape(-1, batch_size)
# loop over batches
for idxs in batch_indices:
probs = batch_model(X_train[idxs], weights, biases)
loss = nll_loss(probs, y_train[idxs])
loss.backward()
for i in range(3):
# SGD updates for each parameter
weights[i].data = weights[i].data - learning_rate * weights[i].grad.data
biases[i].data = biases[i].data - learning_rate * biases[i].grad.data
# Zero the gradients for the next iteration
weights[i].grad.data.zero_()
biases[i].grad.data.zero_()
with torch.no_grad():
probs = batch_model(X_test, weights, biases)
acc = accuracy(probs, y_test)
print(f"Accuracy after epoch {epoch}: {acc}")