{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "38614803",
   "metadata": {},
   "source": [
    "# Exercise 8"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a193d777",
   "metadata": {},
   "outputs": [],
   "source": [
    "from datasets import load_dataset\n",
    "import torch\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4c311ff0",
   "metadata": {},
   "source": [
    "## Data Preparation\n",
    "\n",
    "The data preparation only uses concepts you already know from previous lectures. We therefore start with clean datasets for training and validation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9944ef98",
   "metadata": {},
   "outputs": [],
   "source": [
    "data = load_dataset(\"mnist\")\n",
    "data.set_format(\"torch\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "de2a80fa",
   "metadata": {},
   "outputs": [],
   "source": [
    "example = data[\"test\"][0]\n",
    "print(f\"True label: {int(example['label'])}\")\n",
    "fig = plt.imshow(example[\"image\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e50d26dd",
   "metadata": {},
   "outputs": [],
   "source": [
    "img_size = example[\"image\"].numel()\n",
    "img_size"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4b435875",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Dividing by 255 maps pixel values to 0, 1\n",
    "X_train = data[\"train\"][\"image\"][:].reshape(-1, img_size).to(torch.float) / 255\n",
    "X_test = data[\"test\"][\"image\"][:].reshape(-1, img_size).to(torch.float) / 255\n",
    "y_train = data[\"train\"][\"label\"]\n",
    "y_test = data[\"test\"][\"label\"]\n",
    "X_train.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0342ee61",
   "metadata": {},
   "source": [
    "## Dimensions of our Neural Network"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2d8a15bd",
   "metadata": {},
   "outputs": [],
   "source": [
    "# the input dimension\n",
    "n_in = img_size\n",
    "# the dimension of our 2 hidden layers\n",
    "n_hidden = 16\n",
    "# the dimension of our output layer\n",
    "n_out = 10"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5db45eb0",
   "metadata": {},
   "source": [
    "## Task 1: How many Parameters?\n",
    "\n",
    "The number of trainable parameters is entirly determined by the number of layers and their dimensions. \n",
    "\n",
    "Write a function called `count_params(n_in, n_hidden, n_out)` that counts how many parameters will be in our model. Assume that there are 2 hidden layers. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2a2400ba",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "ff1346d1",
   "metadata": {},
   "source": [
    "## Task 2: Set up random start parameters\n",
    "\n",
    "We want to draw random start parameters that are distributed uniformly between -0.5 and 0.5. \n",
    "\n",
    "Since we are going to modify the parameters in-place while training the model, we need a way to freshly generate the start parameters multiple times. We therefore create a function that draws start parameters. \n",
    "\n",
    "The function takes the following arguments:\n",
    "    - n_in\n",
    "    - n_hidden\n",
    "    - n_out\n",
    "    - seed (give it a default value of 1995 so we all get the same results)\n",
    "    \n",
    "The function returns:\n",
    "    - a list of weight matrices with the correct shapes\n",
    "    - a list of biases with the correct shapes "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c5d45bea",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "cab6a138",
   "metadata": {},
   "source": [
    "## Task 3: Implement relu and softmax\n",
    "\n",
    "1. Implement a relu function that takes a 1d tensor and applies the relu nonlinearity elementwise\n",
    "2. Implement a softmax function that takes a 1d tensor of logits and returns a 1d tensor of probabilities\n",
    "3. Test your function on a small tensor \n",
    "4. If you have time implement other nonlinearities such as sigmoid, tanh, ..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0f92e85b",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a62099d6",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "eb91ee87",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "e619d0ac",
   "metadata": {},
   "source": [
    "## Task 4: Implement the model\n",
    "\n",
    "The model should take the following arguments:\n",
    "- x: A 1d tensor with a flattened image\n",
    "- weights: The list of weights from task 2\n",
    "- biases: The list of biases from task 2\n",
    "\n",
    "It should return a 1d tensor of length `n_out` that contains probabilities for each category. \n",
    "\n",
    "1. Implement a `model` function\n",
    "2. Try it out on the first element of the training data\n",
    "3. Try out the batch_model function on the first few rows of the training data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "86f98fe4",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "af245938",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "55687ef4",
   "metadata": {},
   "source": [
    "In the training process we need a `batch_model` function that evaluates the model on a batch of data. This is not very instructional, so I give you the function right away."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9bc9cb69",
   "metadata": {},
   "outputs": [],
   "source": [
    "def batch_model(batch, weights, biases):\n",
    "    n_out = len(biases[-1])\n",
    "    out = torch.zeros((len(batch), n_out))\n",
    "    for i, x in enumerate(batch):\n",
    "        out[i] = model(x, weights, biases)\n",
    "    return out"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "11435f78",
   "metadata": {},
   "source": [
    "## Task 5: Implement loss functions\n",
    "\n",
    "1. Write a function called `nnl_loss` that takes the result of the batch_model and returns the average negative log likelihood.\n",
    "2. Try it out on the first 100 rows of the training data\n",
    "3. Implement an `accuracy` function that takes the same arguments as the loss function\n",
    "4. Try it out on the first 100 rows of the training data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8cb53640",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6a727319",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7e286a37",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "db452996",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "cd7dc637",
   "metadata": {},
   "source": [
    "## Task 6: The training loop\n",
    "\n",
    "0. Create fresh weights and biases\n",
    "1. Set `requires_grad` to True for all tensors in the weights and biases list. \n",
    "2. Write a training loop to train your model with SGD and the following hyper-parameters\n",
    "    - n_epochs: 2\n",
    "    - batch_size: 100,\n",
    "    - learning_rate: 0.001\n",
    "3. If you have time, try the model out on a few images\n",
    "\n",
    "**Important**: Do the entire training in just one cell and re-create the start parameters at the beginning of that cell, so each training run starts from the same position. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4ce11263",
   "metadata": {},
   "outputs": [],
   "source": [
    "# create fresh random weights and biases\n",
    "\n",
    "# set requires_grad to True for training\n",
    "\n",
    "# define the hyperparameters\n",
    "\n",
    "# loop over epochs\n",
    "\n",
    "# loop over batches\n",
    "# evaluate model\n",
    "# evaluate loss\n",
    "# backwards\n",
    "\n",
    "# loop over the paramter lists\n",
    "# SGD updates for each parameter tensor\n",
    "\n",
    "# Zero the gradients for the next iteration"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "89c1aa2c",
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9e30650c",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3b8e9161",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "909a62bb",
   "metadata": {},
   "source": [
    "## Task 7: Diagnostics\n",
    "\n",
    "1. Copy-paste the training loop from the previous task or work in the same cell as before.\n",
    "2. After each epoch, evaluate the batch_model on test data with the current best parameters; Use `torch.no_grad` to disable gradients.\n",
    "3. Calculate the accuracy score no the result and print it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "758a10c2",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "baf978c0",
   "metadata": {},
   "source": [
    "## Task 8: Training the model\n",
    "\n",
    "Tweak the number of epochs, batch size and learning rate until you get an accuracy of at least 90 %\n",
    "\n",
    "Copy paste the code from the previous task or work in the same cell. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fae05918",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}