{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "998b266e",
   "metadata": {},
   "source": [
    "# Exercise 4"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "19c9b1f0",
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.datasets import load_digits\n",
    "import pandas as pd\n",
    "import seaborn as sns"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7eba2cc2",
   "metadata": {},
   "source": [
    "## Note on import statements\n",
    "\n",
    "- In all real projects, all import statements should be in the first cell of a notebook\n",
    "- It is part of this exercise that you learn how to import what you need from sklearn\n",
    "- Therefore, in this exercise notebooks you will see imports in many places"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "da8f909d",
   "metadata": {},
   "source": [
    "## Task 1: Load and inspect the dataset\n",
    "\n",
    "In this task you will load the digits dataset from `sklearn.datasets`, using scikit-learn's `load_digits` function, which will return a dictionary-like `Bunch` object. \n",
    "\n",
    "The goal of this warmp-up task is that you use your Python knowledge to inspect the object you get from `load_digits`. You do not need to google.\n",
    "\n",
    "\n",
    "1. List the keys of the object\n",
    "2. Look some of the entries and understand their format (e.g. using `type()` and `.shape`\n",
    "3. Look at the description inside digits and find all the terms mentioned on the terminology slide"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f5f091ee",
   "metadata": {},
   "outputs": [],
   "source": [
    "digits = load_digits()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4496e8e9",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b0c66be4",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c8bee711",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "12cce699",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "1c80915c",
   "metadata": {},
   "source": [
    "## Task 2: Data splitting\n",
    "\n",
    "Split the data and assign the splits to the variables `X_train`, `X_test`, `y_train`, `y_test`. Set a `random_state` of your choice. Split such that the training sets contain 75 percent of the data. Confirm that by looking at the shapes of the resulting arrays. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "54a43501",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d143a599",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "4aa48bef",
   "metadata": {},
   "source": [
    "## Task 3: Logistic Regression\n",
    "\n",
    "1. Run a logistic regression without regularization and with intercept\n",
    "2. Use the fitted model to create predictions on the test dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4813bad1",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e9a3e1ec",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3348f46d",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "92281238",
   "metadata": {},
   "source": [
    "## Task 4: Assess model quality\n",
    "\n",
    "1. Calculate the accurracy score\n",
    "2. Calculate the f1 score\n",
    "3. Convert the `\"target_names\"` to a `string` data type\n",
    "4. Create a classification report\n",
    "5. Calculate a confusion_matrix\n",
    "6. Plot the confusion matrix using seaborns [heatmap function](https://seaborn.pydata.org/generated/seaborn.heatmap.html) (Optional)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d1d09c59",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e7fc5cf3",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b801c5e3",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e05ed5db",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "058d209a",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "742fed3a",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "7f2076df",
   "metadata": {},
   "source": [
    "## Task 5: Logit fitting with penalty\n",
    "\n",
    "1. Run a logistic regression with an \"l2\" penalty. Set the penalty parametr C = $1 / \\lambda$ to 1. \n",
    "2. You will get a warning. You have two options to solve it:\n",
    "    1. Find a good explanation of why it is acceptable to ignore this warning. Relate this to the differences between machine learning and econometrics\n",
    "    2. Change the settings so you don't get the warning"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9014354d",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "fe53cb99",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "b8d1b5ee",
   "metadata": {},
   "source": [
    "## Task 6: Understanding decision trees and random forrests in group work\n",
    "\n",
    "Read the following two sections of the Python Data Science Handbook\n",
    "\n",
    "- [Decision trees](https://jakevdp.github.io/PythonDataScienceHandbook/05.08-random-forests.html#Motivating-Random-Forests:-Decision-Trees)\n",
    "- [Random forrests](https://jakevdp.github.io/PythonDataScienceHandbook/05.08-random-forests.html#Ensembles-of-Estimators:-Random-Forests)\n",
    "\n",
    "Discuss decision trees and random forrests with your neighbor or in groups of up to 5 people. Make sure, everyone understands the basic idea and no-one gets hung-up on small technicalities. \n",
    "\n",
    "After everyone has a good understanding of the two methods, go through the basic steps (import, create model instance, fit, evaluate score) for a decision tree and a random forrest."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "567dc9f1",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d5c7f7c8",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "1fbf38af",
   "metadata": {},
   "source": [
    "## Task 7: K-fold Cross Validation\n",
    "\n",
    "Do a five fold cross validation for a model of your choice on the training dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "88ec1908",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "68e2fd8d",
   "metadata": {},
   "source": [
    "## Task 8: Hyperparameter tuning\n",
    "\n",
    "Tune the hyperparameters of one of the methods used above using a grid search with cross validation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c32673de",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7b6a1b86",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "386d871a",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ade30d29",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}