{ "cells": [ { "cell_type": "markdown", "id": "38614803", "metadata": {}, "source": [ "# Exercise 8" ] }, { "cell_type": "code", "execution_count": null, "id": "a193d777", "metadata": {}, "outputs": [], "source": [ "from datasets import load_dataset\n", "import torch\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "id": "4c311ff0", "metadata": {}, "source": [ "## Data Preparation\n", "\n", "The data preparation only uses concepts you already know from previous lectures. We therefore start with clean datasets for training and validation." ] }, { "cell_type": "code", "execution_count": null, "id": "9944ef98", "metadata": {}, "outputs": [], "source": [ "data = load_dataset(\"mnist\")\n", "data.set_format(\"torch\")" ] }, { "cell_type": "code", "execution_count": null, "id": "de2a80fa", "metadata": {}, "outputs": [], "source": [ "example = data[\"test\"][0]\n", "print(f\"True label: {int(example['label'])}\")\n", "fig = plt.imshow(example[\"image\"])" ] }, { "cell_type": "code", "execution_count": null, "id": "e50d26dd", "metadata": {}, "outputs": [], "source": [ "img_size = example[\"image\"].numel()\n", "img_size" ] }, { "cell_type": "code", "execution_count": null, "id": "4b435875", "metadata": {}, "outputs": [], "source": [ "# Dividing by 255 maps pixel values to 0, 1\n", "X_train = data[\"train\"][\"image\"][:].reshape(-1, img_size).to(torch.float) / 255\n", "X_test = data[\"test\"][\"image\"][:].reshape(-1, img_size).to(torch.float) / 255\n", "y_train = data[\"train\"][\"label\"]\n", "y_test = data[\"test\"][\"label\"]\n", "X_train.shape" ] }, { "cell_type": "markdown", "id": "0342ee61", "metadata": {}, "source": [ "## Dimensions of our Neural Network" ] }, { "cell_type": "code", "execution_count": null, "id": "2d8a15bd", "metadata": {}, "outputs": [], "source": [ "# the input dimension\n", "n_in = img_size\n", "# the dimension of our 2 hidden layers\n", "n_hidden = 16\n", "# the dimension of our output layer\n", "n_out = 10" ] }, { "cell_type": "markdown", "id": "5db45eb0", "metadata": {}, "source": [ "## Task 1: How many Parameters?\n", "\n", "The number of trainable parameters is entirly determined by the number of layers and their dimensions. \n", "\n", "Write a function called `count_params(n_in, n_hidden, n_out)` that counts how many parameters will be in our model. Assume that there are 2 hidden layers. " ] }, { "cell_type": "code", "execution_count": null, "id": "2a2400ba", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "ff1346d1", "metadata": {}, "source": [ "## Task 2: Set up random start parameters\n", "\n", "We want to draw random start parameters that are distributed uniformly between -0.5 and 0.5. \n", "\n", "Since we are going to modify the parameters in-place while training the model, we need a way to freshly generate the start parameters multiple times. We therefore create a function that draws start parameters. \n", "\n", "The function takes the following arguments:\n", " - n_in\n", " - n_hidden\n", " - n_out\n", " - seed (give it a default value of 1995 so we all get the same results)\n", " \n", "The function returns:\n", " - a list of weight matrices with the correct shapes\n", " - a list of biases with the correct shapes " ] }, { "cell_type": "code", "execution_count": null, "id": "c5d45bea", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "cab6a138", "metadata": {}, "source": [ "## Task 3: Implement relu and softmax\n", "\n", "1. Implement a relu function that takes a 1d tensor and applies the relu nonlinearity elementwise\n", "2. Implement a softmax function that takes a 1d tensor of logits and returns a 1d tensor of probabilities\n", "3. Test your function on a small tensor \n", "4. If you have time implement other nonlinearities such as sigmoid, tanh, ..." ] }, { "cell_type": "code", "execution_count": null, "id": "0f92e85b", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "a62099d6", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "eb91ee87", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "e619d0ac", "metadata": {}, "source": [ "## Task 4: Implement the model\n", "\n", "The model should take the following arguments:\n", "- x: A 1d tensor with a flattened image\n", "- weights: The list of weights from task 2\n", "- biases: The list of biases from task 2\n", "\n", "It should return a 1d tensor of length `n_out` that contains probabilities for each category. \n", "\n", "1. Implement a `model` function\n", "2. Try it out on the first element of the training data\n", "3. Try out the batch_model function on the first few rows of the training data" ] }, { "cell_type": "code", "execution_count": null, "id": "86f98fe4", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "af245938", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "55687ef4", "metadata": {}, "source": [ "In the training process we need a `batch_model` function that evaluates the model on a batch of data. This is not very instructional, so I give you the function right away." ] }, { "cell_type": "code", "execution_count": null, "id": "9bc9cb69", "metadata": {}, "outputs": [], "source": [ "def batch_model(batch, weights, biases):\n", " n_out = len(biases[-1])\n", " out = torch.zeros((len(batch), n_out))\n", " for i, x in enumerate(batch):\n", " out[i] = model(x, weights, biases)\n", " return out" ] }, { "cell_type": "markdown", "id": "11435f78", "metadata": {}, "source": [ "## Task 5: Implement loss functions\n", "\n", "1. Write a function called `nnl_loss` that takes the result of the batch_model and returns the average negative log likelihood.\n", "2. Try it out on the first 100 rows of the training data\n", "3. Implement an `accuracy` function that takes the same arguments as the loss function\n", "4. Try it out on the first 100 rows of the training data" ] }, { "cell_type": "code", "execution_count": null, "id": "8cb53640", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "6a727319", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "7e286a37", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "db452996", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "cd7dc637", "metadata": {}, "source": [ "## Task 6: The training loop\n", "\n", "0. Create fresh weights and biases\n", "1. Set `requires_grad` to True for all tensors in the weights and biases list. \n", "2. Write a training loop to train your model with SGD and the following hyper-parameters\n", " - n_epochs: 2\n", " - batch_size: 100,\n", " - learning_rate: 0.001\n", "3. If you have time, try the model out on a few images\n", "\n", "**Important**: Do the entire training in just one cell and re-create the start parameters at the beginning of that cell, so each training run starts from the same position. " ] }, { "cell_type": "code", "execution_count": null, "id": "4ce11263", "metadata": {}, "outputs": [], "source": [ "# create fresh random weights and biases\n", "\n", "# set requires_grad to True for training\n", "\n", "# define the hyperparameters\n", "\n", "# loop over epochs\n", "\n", "# loop over batches\n", "# evaluate model\n", "# evaluate loss\n", "# backwards\n", "\n", "# loop over the paramter lists\n", "# SGD updates for each parameter tensor\n", "\n", "# Zero the gradients for the next iteration" ] }, { "cell_type": "code", "execution_count": null, "id": "89c1aa2c", "metadata": { "scrolled": false }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "9e30650c", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "3b8e9161", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "909a62bb", "metadata": {}, "source": [ "## Task 7: Diagnostics\n", "\n", "1. Copy-paste the training loop from the previous task or work in the same cell as before.\n", "2. After each epoch, evaluate the batch_model on test data with the current best parameters; Use `torch.no_grad` to disable gradients.\n", "3. Calculate the accuracy score no the result and print it." ] }, { "cell_type": "code", "execution_count": null, "id": "758a10c2", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "baf978c0", "metadata": {}, "source": [ "## Task 8: Training the model\n", "\n", "Tweak the number of epochs, batch size and learning rate until you get an accuracy of at least 90 %\n", "\n", "Copy paste the code from the previous task or work in the same cell. " ] }, { "cell_type": "code", "execution_count": null, "id": "fae05918", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.3" } }, "nbformat": 4, "nbformat_minor": 5 }