{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "38614803",
   "metadata": {},
   "source": [
    "# Exercise 8 (solution)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a193d777",
   "metadata": {},
   "outputs": [],
   "source": [
    "from datasets import load_dataset\n",
    "import torch\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4c311ff0",
   "metadata": {},
   "source": [
    "## Data Preparation\n",
    "\n",
    "The data preparation only uses concepts you already know from previous lectures. We therefore start with clean datasets for training and validation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9944ef98",
   "metadata": {},
   "outputs": [],
   "source": [
    "data = load_dataset(\"mnist\")\n",
    "data.set_format(\"torch\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "de2a80fa",
   "metadata": {},
   "outputs": [],
   "source": [
    "example = data[\"test\"][0]\n",
    "print(f\"True label: {int(example['label'])}\")\n",
    "fig = plt.imshow(example[\"image\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e50d26dd",
   "metadata": {},
   "outputs": [],
   "source": [
    "img_size = example[\"image\"].numel()\n",
    "img_size"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4b435875",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Dividing by 255 maps pixel values to 0, 1\n",
    "X_train = data[\"train\"][\"image\"][:].reshape(-1, img_size).to(torch.float) / 255\n",
    "X_test = data[\"test\"][\"image\"][:].reshape(-1, img_size).to(torch.float) / 255\n",
    "y_train = data[\"train\"][\"label\"]\n",
    "y_test = data[\"test\"][\"label\"]\n",
    "X_train.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0342ee61",
   "metadata": {},
   "source": [
    "## Dimensions of our Neural Network"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2d8a15bd",
   "metadata": {},
   "outputs": [],
   "source": [
    "# the input dimension\n",
    "n_in = img_size\n",
    "# the dimension of our 2 hidden layers\n",
    "n_hidden = 16\n",
    "# the dimension of our output layer\n",
    "n_out = 10"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5db45eb0",
   "metadata": {},
   "source": [
    "## Task 1: How many Parameters?\n",
    "\n",
    "The number of trainable parameters is entirly determined by the number of layers and their dimensions. \n",
    "\n",
    "Write a function called `count_params(n_in, n_hidden, n_out)` that counts how many parameters will be in our model. Assume that there are 2 hidden layers. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2a2400ba",
   "metadata": {},
   "outputs": [],
   "source": [
    "def count_params(n_in, n_hidden, n_out):\n",
    "    n_weights = n_hidden * (n_in + n_hidden + n_out)\n",
    "    n_biases = 2 * n_hidden + n_out\n",
    "    return n_weights + n_biases\n",
    "\n",
    "\n",
    "count_params(n_in, n_hidden, n_out)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ff1346d1",
   "metadata": {},
   "source": [
    "## Task 2: Set up random start parameters\n",
    "\n",
    "We want to draw random start parameters that are distributed uniformly between -0.5 and 0.5. \n",
    "\n",
    "Since we are going to modify the parameters in-place while training the model, we need a way to freshly generate the start parameters multiple times. We therefore create a function that draws start parameters. \n",
    "\n",
    "The function takes the following arguments:\n",
    "    - n_in\n",
    "    - n_hidden\n",
    "    - n_out\n",
    "    - seed (give it a default value of 1995 so we all get the same results)\n",
    "    \n",
    "The function returns:\n",
    "    - a list of weight matrices with the correct shapes\n",
    "    - a list of biases with the correct shapes "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c5d45bea",
   "metadata": {},
   "outputs": [],
   "source": [
    "def create_params(n_in, n_hidden, n_out, seed=1995):\n",
    "    torch.manual_seed(1995)\n",
    "    weights = [\n",
    "        torch.rand((n_hidden, n_in)) - 0.5,\n",
    "        torch.rand((n_hidden, n_hidden)) - 0.5,\n",
    "        torch.rand((n_out, n_hidden)) - 0.5,\n",
    "    ]\n",
    "\n",
    "    biases = [\n",
    "        torch.rand(n_hidden) - 0.5,\n",
    "        torch.rand(n_hidden) - 0.5,\n",
    "        torch.rand(n_out) - 0.5,\n",
    "    ]\n",
    "    return weights, biases\n",
    "\n",
    "\n",
    "weights, biases = create_params(n_in, n_hidden, n_out)\n",
    "biases"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cab6a138",
   "metadata": {},
   "source": [
    "## Task 3: Implement relu and softmax\n",
    "\n",
    "1. Implement a relu function that takes a 1d tensor and applies the relu nonlinearity elementwise\n",
    "2. Implement a softmax function that takes a 1d tensor of logits and returns a 1d tensor of probabilities\n",
    "3. Test your function on a small tensor \n",
    "4. If you have time implement other nonlinearities such as sigmoid, tanh, ..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0f92e85b",
   "metadata": {},
   "outputs": [],
   "source": [
    "def relu(x):\n",
    "    \"\"\"Calculate the elementwise relu nonlinearity on x.\"\"\"\n",
    "    return torch.clip(x, 0)\n",
    "\n",
    "\n",
    "def softmax(x):\n",
    "    \"\"\"Compute softmax values over x.\n",
    "\n",
    "    Subtracting the max is optional but improves numerical stability\n",
    "\n",
    "    \"\"\"\n",
    "    e_x = torch.exp(x - torch.max(x))\n",
    "    return e_x / e_x.sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a62099d6",
   "metadata": {},
   "outputs": [],
   "source": [
    "relu(torch.linspace(-1, 1, 5))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "eb91ee87",
   "metadata": {},
   "outputs": [],
   "source": [
    "softmax(torch.tensor([-1, 3, -2]))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e619d0ac",
   "metadata": {},
   "source": [
    "## Task 4: Implement the model\n",
    "\n",
    "The model should take the following arguments:\n",
    "- x: A 1d tensor with a flattened image\n",
    "- weights: The list of weights from task 2\n",
    "- biases: The list of biases from task 2\n",
    "\n",
    "It should return a 1d tensor of length `n_out` that contains probabilities for each category. \n",
    "\n",
    "1. Implement a `model` function\n",
    "2. Try it out on the first element of the training data\n",
    "3. Try out the batch_model function on the first few rows of the training data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "86f98fe4",
   "metadata": {},
   "outputs": [],
   "source": [
    "def model(x, weights, biases):\n",
    "    h1 = relu(weights[0] @ x + biases[0])\n",
    "    h2 = relu(weights[1] @ h1 + biases[1])\n",
    "    return softmax(weights[2] @ h2 + biases[2])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "af245938",
   "metadata": {},
   "outputs": [],
   "source": [
    "model(X_train[0], weights, biases)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "55687ef4",
   "metadata": {},
   "source": [
    "In the training process we need a `batch_model` function that evaluates the model on a batch of data. This is not very instructional, so I give you the function right away."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9bc9cb69",
   "metadata": {},
   "outputs": [],
   "source": [
    "def batch_model(batch, weights, biases):\n",
    "    n_out = len(biases[-1])\n",
    "    out = torch.zeros((len(batch), n_out))\n",
    "    for i, x in enumerate(batch):\n",
    "        out[i] = model(x, weights, biases)\n",
    "    return out"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "11435f78",
   "metadata": {},
   "source": [
    "## Task 5: Implement loss functions\n",
    "\n",
    "1. Write a function called `nnl_loss` that takes the result of the batch_model and returns the average negative log likelihood.\n",
    "2. Try it out on the first 100 rows of the training data\n",
    "3. Implement an `accuracy` function that takes the same arguments as the loss function\n",
    "4. Try it out on the first 100 rows of the training data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8cb53640",
   "metadata": {},
   "outputs": [],
   "source": [
    "def nll_loss(probs, labels):\n",
    "    likelihoods = probs[torch.arange(len(probs)), labels] + 1e-50\n",
    "    loglikes = torch.log(likelihoods)\n",
    "    return -loglikes.mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6a727319",
   "metadata": {},
   "outputs": [],
   "source": [
    "probs = batch_model(X_train[:100], weights, biases)\n",
    "labels = y_train[:100]\n",
    "\n",
    "nll_loss(probs, labels)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7e286a37",
   "metadata": {},
   "outputs": [],
   "source": [
    "def accuracy(probs, labels):\n",
    "    y_pred = probs.argmax(axis=1)\n",
    "    return (y_pred == labels).to(torch.float).mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "db452996",
   "metadata": {},
   "outputs": [],
   "source": [
    "accuracy(probs, labels)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cd7dc637",
   "metadata": {},
   "source": [
    "## Task 6: The training loop\n",
    "\n",
    "0. Create fresh weights and biases\n",
    "1. Set `requires_grad` to True for all tensors in the weights and biases list. \n",
    "2. Write a training loop to train your model with SGD and the following hyper-parameters\n",
    "    - n_epochs: 2\n",
    "    - batch_size: 100,\n",
    "    - learning_rate: 0.001\n",
    "3. If you have time, try the model out on a few images\n",
    "\n",
    "**Important**: Do the entire training in just one cell and re-create the start parameters at the beginning of that cell, so each training run starts from the same position. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4ce11263",
   "metadata": {},
   "outputs": [],
   "source": [
    "# create fresh random weights and biases\n",
    "\n",
    "# set requires_grad to True for training\n",
    "\n",
    "# define the hyperparameters\n",
    "\n",
    "# loop over epochs\n",
    "\n",
    "# loop over batches\n",
    "# evaluate model\n",
    "# evaluate loss\n",
    "# backwards\n",
    "\n",
    "# loop over the paramter lists\n",
    "# SGD updates for each parameter tensor\n",
    "\n",
    "# Zero the gradients for the next iteration"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "89c1aa2c",
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "# create fresh random weights and biases\n",
    "weights, biases = create_params(n_in, n_hidden, n_out)\n",
    "\n",
    "# set requires_grad to True for training\n",
    "for i in range(3):\n",
    "    weights[i].requires_grad = True\n",
    "    biases[i].requires_grad = True\n",
    "\n",
    "# define the hyperparameters\n",
    "n_epochs = 2\n",
    "batch_size = 100\n",
    "learning_rate = 0.01\n",
    "\n",
    "# loop over epochs\n",
    "for _epoch in range(n_epochs):\n",
    "    batch_indices = torch.randperm(len(X_train)).reshape(-1, batch_size)\n",
    "    # loop over batches\n",
    "    for idxs in batch_indices:\n",
    "        probs = batch_model(X_train[idxs], weights, biases)\n",
    "        loss = nll_loss(probs, y_train[idxs])\n",
    "        loss.backward()\n",
    "\n",
    "        for i in range(3):\n",
    "            # SGD updates for each parameter\n",
    "            weights[i].data = weights[i].data - learning_rate * weights[i].grad.data\n",
    "            biases[i].data = biases[i].data - learning_rate * biases[i].grad.data\n",
    "            # Zero the gradients for the next iteration\n",
    "            weights[i].grad.data.zero_()\n",
    "            biases[i].grad.data.zero_()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9e30650c",
   "metadata": {},
   "outputs": [],
   "source": [
    "example_idx = 0\n",
    "with torch.no_grad():\n",
    "    probs = model(X_test[example_idx], weights, biases)\n",
    "\n",
    "probs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3b8e9161",
   "metadata": {},
   "outputs": [],
   "source": [
    "probs[y_test[example_idx]]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "909a62bb",
   "metadata": {},
   "source": [
    "## Task 7: Diagnostics\n",
    "\n",
    "1. Copy-paste the training loop from the previous task or work in the same cell as before.\n",
    "2. After each epoch, evaluate the batch_model on test data with the current best parameters; Use `torch.no_grad` to disable gradients.\n",
    "3. Calculate the accuracy score no the result and print it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "758a10c2",
   "metadata": {},
   "outputs": [],
   "source": [
    "# create fresh random weights and biases\n",
    "weights, biases = create_params(n_in, n_hidden, n_out)\n",
    "\n",
    "# set requires_grad to True for training\n",
    "for i in range(3):\n",
    "    weights[i].requires_grad = True\n",
    "    biases[i].requires_grad = True\n",
    "\n",
    "# define the hyperparameters\n",
    "n_epochs = 2\n",
    "batch_size = 100\n",
    "learning_rate = 0.01\n",
    "\n",
    "# loop over epochs\n",
    "for epoch in range(n_epochs):\n",
    "    batch_indices = torch.randperm(len(X_train)).reshape(-1, batch_size)\n",
    "    # loop over batches\n",
    "    for idxs in batch_indices:\n",
    "        probs = batch_model(X_train[idxs], weights, biases)\n",
    "        loss = nll_loss(probs, y_train[idxs])\n",
    "        loss.backward()\n",
    "\n",
    "        for i in range(3):\n",
    "            # SGD updates for each parameter\n",
    "            weights[i].data = weights[i].data - learning_rate * weights[i].grad.data\n",
    "            biases[i].data = biases[i].data - learning_rate * biases[i].grad.data\n",
    "            # Zero the gradients for the next iteration\n",
    "            weights[i].grad.data.zero_()\n",
    "            biases[i].grad.data.zero_()\n",
    "\n",
    "    with torch.no_grad():\n",
    "        probs = batch_model(X_test, weights, biases)\n",
    "        acc = accuracy(probs, y_test)\n",
    "    print(f\"Accuracy after epoch {epoch}: {acc}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "baf978c0",
   "metadata": {},
   "source": [
    "## Task 8: Training the model\n",
    "\n",
    "Tweak the number of epochs, batch size and learning rate until you get an accuracy of at least 90 %\n",
    "\n",
    "Copy paste the code from the previous task or work in the same cell. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fae05918",
   "metadata": {},
   "outputs": [],
   "source": [
    "# create fresh random weights and biases\n",
    "weights, biases = create_params(n_in, n_hidden, n_out)\n",
    "\n",
    "# set requires_grad to True for training\n",
    "for i in range(3):\n",
    "    weights[i].requires_grad = True\n",
    "    biases[i].requires_grad = True\n",
    "\n",
    "# define the hyperparameters\n",
    "n_epochs = 5\n",
    "batch_size = 25\n",
    "learning_rate = 0.1\n",
    "\n",
    "# loop over epochs\n",
    "for epoch in range(n_epochs):\n",
    "    batch_indices = torch.randperm(len(X_train)).reshape(-1, batch_size)\n",
    "    # loop over batches\n",
    "    for idxs in batch_indices:\n",
    "        probs = batch_model(X_train[idxs], weights, biases)\n",
    "        loss = nll_loss(probs, y_train[idxs])\n",
    "        loss.backward()\n",
    "\n",
    "        for i in range(3):\n",
    "            # SGD updates for each parameter\n",
    "            weights[i].data = weights[i].data - learning_rate * weights[i].grad.data\n",
    "            biases[i].data = biases[i].data - learning_rate * biases[i].grad.data\n",
    "            # Zero the gradients for the next iteration\n",
    "            weights[i].grad.data.zero_()\n",
    "            biases[i].grad.data.zero_()\n",
    "\n",
    "    with torch.no_grad():\n",
    "        probs = batch_model(X_test, weights, biases)\n",
    "        acc = accuracy(probs, y_test)\n",
    "    print(f\"Accuracy after epoch {epoch}: {acc}\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}