{ "cells": [ { "cell_type": "markdown", "id": "38614803", "metadata": {}, "source": [ "# Exercise 8 (solution)" ] }, { "cell_type": "code", "execution_count": null, "id": "a193d777", "metadata": {}, "outputs": [], "source": [ "from datasets import load_dataset\n", "import torch\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "id": "4c311ff0", "metadata": {}, "source": [ "## Data Preparation\n", "\n", "The data preparation only uses concepts you already know from previous lectures. We therefore start with clean datasets for training and validation." ] }, { "cell_type": "code", "execution_count": null, "id": "9944ef98", "metadata": {}, "outputs": [], "source": [ "data = load_dataset(\"mnist\")\n", "data.set_format(\"torch\")" ] }, { "cell_type": "code", "execution_count": null, "id": "de2a80fa", "metadata": {}, "outputs": [], "source": [ "example = data[\"test\"][0]\n", "print(f\"True label: {int(example['label'])}\")\n", "fig = plt.imshow(example[\"image\"])" ] }, { "cell_type": "code", "execution_count": null, "id": "e50d26dd", "metadata": {}, "outputs": [], "source": [ "img_size = example[\"image\"].numel()\n", "img_size" ] }, { "cell_type": "code", "execution_count": null, "id": "4b435875", "metadata": {}, "outputs": [], "source": [ "# Dividing by 255 maps pixel values to 0, 1\n", "X_train = data[\"train\"][\"image\"][:].reshape(-1, img_size).to(torch.float) / 255\n", "X_test = data[\"test\"][\"image\"][:].reshape(-1, img_size).to(torch.float) / 255\n", "y_train = data[\"train\"][\"label\"]\n", "y_test = data[\"test\"][\"label\"]\n", "X_train.shape" ] }, { "cell_type": "markdown", "id": "0342ee61", "metadata": {}, "source": [ "## Dimensions of our Neural Network" ] }, { "cell_type": "code", "execution_count": null, "id": "2d8a15bd", "metadata": {}, "outputs": [], "source": [ "# the input dimension\n", "n_in = img_size\n", "# the dimension of our 2 hidden layers\n", "n_hidden = 16\n", "# the dimension of our output layer\n", "n_out = 10" ] }, { "cell_type": "markdown", "id": "5db45eb0", "metadata": {}, "source": [ "## Task 1: How many Parameters?\n", "\n", "The number of trainable parameters is entirly determined by the number of layers and their dimensions. \n", "\n", "Write a function called `count_params(n_in, n_hidden, n_out)` that counts how many parameters will be in our model. Assume that there are 2 hidden layers. " ] }, { "cell_type": "code", "execution_count": null, "id": "2a2400ba", "metadata": {}, "outputs": [], "source": [ "def count_params(n_in, n_hidden, n_out):\n", " n_weights = n_hidden * (n_in + n_hidden + n_out)\n", " n_biases = 2 * n_hidden + n_out\n", " return n_weights + n_biases\n", "\n", "\n", "count_params(n_in, n_hidden, n_out)" ] }, { "cell_type": "markdown", "id": "ff1346d1", "metadata": {}, "source": [ "## Task 2: Set up random start parameters\n", "\n", "We want to draw random start parameters that are distributed uniformly between -0.5 and 0.5. \n", "\n", "Since we are going to modify the parameters in-place while training the model, we need a way to freshly generate the start parameters multiple times. We therefore create a function that draws start parameters. \n", "\n", "The function takes the following arguments:\n", " - n_in\n", " - n_hidden\n", " - n_out\n", " - seed (give it a default value of 1995 so we all get the same results)\n", " \n", "The function returns:\n", " - a list of weight matrices with the correct shapes\n", " - a list of biases with the correct shapes " ] }, { "cell_type": "code", "execution_count": null, "id": "c5d45bea", "metadata": {}, "outputs": [], "source": [ "def create_params(n_in, n_hidden, n_out, seed=1995):\n", " torch.manual_seed(1995)\n", " weights = [\n", " torch.rand((n_hidden, n_in)) - 0.5,\n", " torch.rand((n_hidden, n_hidden)) - 0.5,\n", " torch.rand((n_out, n_hidden)) - 0.5,\n", " ]\n", "\n", " biases = [\n", " torch.rand(n_hidden) - 0.5,\n", " torch.rand(n_hidden) - 0.5,\n", " torch.rand(n_out) - 0.5,\n", " ]\n", " return weights, biases\n", "\n", "\n", "weights, biases = create_params(n_in, n_hidden, n_out)\n", "biases" ] }, { "cell_type": "markdown", "id": "cab6a138", "metadata": {}, "source": [ "## Task 3: Implement relu and softmax\n", "\n", "1. Implement a relu function that takes a 1d tensor and applies the relu nonlinearity elementwise\n", "2. Implement a softmax function that takes a 1d tensor of logits and returns a 1d tensor of probabilities\n", "3. Test your function on a small tensor \n", "4. If you have time implement other nonlinearities such as sigmoid, tanh, ..." ] }, { "cell_type": "code", "execution_count": null, "id": "0f92e85b", "metadata": {}, "outputs": [], "source": [ "def relu(x):\n", " \"\"\"Calculate the elementwise relu nonlinearity on x.\"\"\"\n", " return torch.clip(x, 0)\n", "\n", "\n", "def softmax(x):\n", " \"\"\"Compute softmax values over x.\n", "\n", " Subtracting the max is optional but improves numerical stability\n", "\n", " \"\"\"\n", " e_x = torch.exp(x - torch.max(x))\n", " return e_x / e_x.sum()" ] }, { "cell_type": "code", "execution_count": null, "id": "a62099d6", "metadata": {}, "outputs": [], "source": [ "relu(torch.linspace(-1, 1, 5))" ] }, { "cell_type": "code", "execution_count": null, "id": "eb91ee87", "metadata": {}, "outputs": [], "source": [ "softmax(torch.tensor([-1, 3, -2]))" ] }, { "cell_type": "markdown", "id": "e619d0ac", "metadata": {}, "source": [ "## Task 4: Implement the model\n", "\n", "The model should take the following arguments:\n", "- x: A 1d tensor with a flattened image\n", "- weights: The list of weights from task 2\n", "- biases: The list of biases from task 2\n", "\n", "It should return a 1d tensor of length `n_out` that contains probabilities for each category. \n", "\n", "1. Implement a `model` function\n", "2. Try it out on the first element of the training data\n", "3. Try out the batch_model function on the first few rows of the training data" ] }, { "cell_type": "code", "execution_count": null, "id": "86f98fe4", "metadata": {}, "outputs": [], "source": [ "def model(x, weights, biases):\n", " h1 = relu(weights[0] @ x + biases[0])\n", " h2 = relu(weights[1] @ h1 + biases[1])\n", " return softmax(weights[2] @ h2 + biases[2])" ] }, { "cell_type": "code", "execution_count": null, "id": "af245938", "metadata": {}, "outputs": [], "source": [ "model(X_train[0], weights, biases)" ] }, { "cell_type": "markdown", "id": "55687ef4", "metadata": {}, "source": [ "In the training process we need a `batch_model` function that evaluates the model on a batch of data. This is not very instructional, so I give you the function right away." ] }, { "cell_type": "code", "execution_count": null, "id": "9bc9cb69", "metadata": {}, "outputs": [], "source": [ "def batch_model(batch, weights, biases):\n", " n_out = len(biases[-1])\n", " out = torch.zeros((len(batch), n_out))\n", " for i, x in enumerate(batch):\n", " out[i] = model(x, weights, biases)\n", " return out" ] }, { "cell_type": "markdown", "id": "11435f78", "metadata": {}, "source": [ "## Task 5: Implement loss functions\n", "\n", "1. Write a function called `nnl_loss` that takes the result of the batch_model and returns the average negative log likelihood.\n", "2. Try it out on the first 100 rows of the training data\n", "3. Implement an `accuracy` function that takes the same arguments as the loss function\n", "4. Try it out on the first 100 rows of the training data" ] }, { "cell_type": "code", "execution_count": null, "id": "8cb53640", "metadata": {}, "outputs": [], "source": [ "def nll_loss(probs, labels):\n", " likelihoods = probs[torch.arange(len(probs)), labels] + 1e-50\n", " loglikes = torch.log(likelihoods)\n", " return -loglikes.mean()" ] }, { "cell_type": "code", "execution_count": null, "id": "6a727319", "metadata": {}, "outputs": [], "source": [ "probs = batch_model(X_train[:100], weights, biases)\n", "labels = y_train[:100]\n", "\n", "nll_loss(probs, labels)" ] }, { "cell_type": "code", "execution_count": null, "id": "7e286a37", "metadata": {}, "outputs": [], "source": [ "def accuracy(probs, labels):\n", " y_pred = probs.argmax(axis=1)\n", " return (y_pred == labels).to(torch.float).mean()" ] }, { "cell_type": "code", "execution_count": null, "id": "db452996", "metadata": {}, "outputs": [], "source": [ "accuracy(probs, labels)" ] }, { "cell_type": "markdown", "id": "cd7dc637", "metadata": {}, "source": [ "## Task 6: The training loop\n", "\n", "0. Create fresh weights and biases\n", "1. Set `requires_grad` to True for all tensors in the weights and biases list. \n", "2. Write a training loop to train your model with SGD and the following hyper-parameters\n", " - n_epochs: 2\n", " - batch_size: 100,\n", " - learning_rate: 0.001\n", "3. If you have time, try the model out on a few images\n", "\n", "**Important**: Do the entire training in just one cell and re-create the start parameters at the beginning of that cell, so each training run starts from the same position. " ] }, { "cell_type": "code", "execution_count": null, "id": "4ce11263", "metadata": {}, "outputs": [], "source": [ "# create fresh random weights and biases\n", "\n", "# set requires_grad to True for training\n", "\n", "# define the hyperparameters\n", "\n", "# loop over epochs\n", "\n", "# loop over batches\n", "# evaluate model\n", "# evaluate loss\n", "# backwards\n", "\n", "# loop over the paramter lists\n", "# SGD updates for each parameter tensor\n", "\n", "# Zero the gradients for the next iteration" ] }, { "cell_type": "code", "execution_count": null, "id": "89c1aa2c", "metadata": { "scrolled": false }, "outputs": [], "source": [ "# create fresh random weights and biases\n", "weights, biases = create_params(n_in, n_hidden, n_out)\n", "\n", "# set requires_grad to True for training\n", "for i in range(3):\n", " weights[i].requires_grad = True\n", " biases[i].requires_grad = True\n", "\n", "# define the hyperparameters\n", "n_epochs = 2\n", "batch_size = 100\n", "learning_rate = 0.01\n", "\n", "# loop over epochs\n", "for _epoch in range(n_epochs):\n", " batch_indices = torch.randperm(len(X_train)).reshape(-1, batch_size)\n", " # loop over batches\n", " for idxs in batch_indices:\n", " probs = batch_model(X_train[idxs], weights, biases)\n", " loss = nll_loss(probs, y_train[idxs])\n", " loss.backward()\n", "\n", " for i in range(3):\n", " # SGD updates for each parameter\n", " weights[i].data = weights[i].data - learning_rate * weights[i].grad.data\n", " biases[i].data = biases[i].data - learning_rate * biases[i].grad.data\n", " # Zero the gradients for the next iteration\n", " weights[i].grad.data.zero_()\n", " biases[i].grad.data.zero_()" ] }, { "cell_type": "code", "execution_count": null, "id": "9e30650c", "metadata": {}, "outputs": [], "source": [ "example_idx = 0\n", "with torch.no_grad():\n", " probs = model(X_test[example_idx], weights, biases)\n", "\n", "probs" ] }, { "cell_type": "code", "execution_count": null, "id": "3b8e9161", "metadata": {}, "outputs": [], "source": [ "probs[y_test[example_idx]]" ] }, { "cell_type": "markdown", "id": "909a62bb", "metadata": {}, "source": [ "## Task 7: Diagnostics\n", "\n", "1. Copy-paste the training loop from the previous task or work in the same cell as before.\n", "2. After each epoch, evaluate the batch_model on test data with the current best parameters; Use `torch.no_grad` to disable gradients.\n", "3. Calculate the accuracy score no the result and print it." ] }, { "cell_type": "code", "execution_count": null, "id": "758a10c2", "metadata": {}, "outputs": [], "source": [ "# create fresh random weights and biases\n", "weights, biases = create_params(n_in, n_hidden, n_out)\n", "\n", "# set requires_grad to True for training\n", "for i in range(3):\n", " weights[i].requires_grad = True\n", " biases[i].requires_grad = True\n", "\n", "# define the hyperparameters\n", "n_epochs = 2\n", "batch_size = 100\n", "learning_rate = 0.01\n", "\n", "# loop over epochs\n", "for epoch in range(n_epochs):\n", " batch_indices = torch.randperm(len(X_train)).reshape(-1, batch_size)\n", " # loop over batches\n", " for idxs in batch_indices:\n", " probs = batch_model(X_train[idxs], weights, biases)\n", " loss = nll_loss(probs, y_train[idxs])\n", " loss.backward()\n", "\n", " for i in range(3):\n", " # SGD updates for each parameter\n", " weights[i].data = weights[i].data - learning_rate * weights[i].grad.data\n", " biases[i].data = biases[i].data - learning_rate * biases[i].grad.data\n", " # Zero the gradients for the next iteration\n", " weights[i].grad.data.zero_()\n", " biases[i].grad.data.zero_()\n", "\n", " with torch.no_grad():\n", " probs = batch_model(X_test, weights, biases)\n", " acc = accuracy(probs, y_test)\n", " print(f\"Accuracy after epoch {epoch}: {acc}\")" ] }, { "cell_type": "markdown", "id": "baf978c0", "metadata": {}, "source": [ "## Task 8: Training the model\n", "\n", "Tweak the number of epochs, batch size and learning rate until you get an accuracy of at least 90 %\n", "\n", "Copy paste the code from the previous task or work in the same cell. " ] }, { "cell_type": "code", "execution_count": null, "id": "fae05918", "metadata": {}, "outputs": [], "source": [ "# create fresh random weights and biases\n", "weights, biases = create_params(n_in, n_hidden, n_out)\n", "\n", "# set requires_grad to True for training\n", "for i in range(3):\n", " weights[i].requires_grad = True\n", " biases[i].requires_grad = True\n", "\n", "# define the hyperparameters\n", "n_epochs = 5\n", "batch_size = 25\n", "learning_rate = 0.1\n", "\n", "# loop over epochs\n", "for epoch in range(n_epochs):\n", " batch_indices = torch.randperm(len(X_train)).reshape(-1, batch_size)\n", " # loop over batches\n", " for idxs in batch_indices:\n", " probs = batch_model(X_train[idxs], weights, biases)\n", " loss = nll_loss(probs, y_train[idxs])\n", " loss.backward()\n", "\n", " for i in range(3):\n", " # SGD updates for each parameter\n", " weights[i].data = weights[i].data - learning_rate * weights[i].grad.data\n", " biases[i].data = biases[i].data - learning_rate * biases[i].grad.data\n", " # Zero the gradients for the next iteration\n", " weights[i].grad.data.zero_()\n", " biases[i].grad.data.zero_()\n", "\n", " with torch.no_grad():\n", " probs = batch_model(X_test, weights, biases)\n", " acc = accuracy(probs, y_test)\n", " print(f\"Accuracy after epoch {epoch}: {acc}\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.3" } }, "nbformat": 4, "nbformat_minor": 5 }