Download the notebook here

Exercise 2#

[ ]:

import numpy as np
import pandas as pd
from pathlib import Path
from seaborn import load_dataset

Task 1: List and dict comprehensions#

Assume that we have a dictionary that stores some quality measures for models of different sizes:

[ ]:

results = {
    "large": {"acc": 0.9, "f1": 0.85},
    "medium": {"acc": 0.83, "f1": 0.87},
    "small": {"acc": 0.7, "f1": 0.5},
}

Create a variable called acc that maps the models to accurracy.

[ ]:

Filter the results dictionary such that only the information for models with an accurracy over 0.8 is kept.

[ ]:

Task 2: Create numpy arrays#

Create the following arrays:

A three-dimensional array of shape (2, 3, 4) containing zeros
A two-dimensional array with 4 rows and 5 columns that contain that is equivalent to the list [[0.1, 0.2, 0.3, 0.4, 0.5], [0.6, 0.7, 0.8, 0.9, 1. ]]. Do not just type in the numbers.
Create a 3 x 3 identity matrix
Create a 3 x 4 empty array (using np.empty) and compare it’s entries with the ones your neighbor gets.

[ ]:

[ ]:

[ ]:

[ ]:

Task 3: Numpy indexing#

Through the entire task, work with the arrays a and b from the lecture slides

[ ]:

a = np.arange(5)
b = np.arange(12).reshape(4, 3)

Select the middle element of a

[ ]:

Select all of a (but you need to put something into the square brackets)

[ ]:

Select the last two rows of b

[ ]:

Select the last two columns of b

[ ]:

Select the last two columns of the last two rows of b

[ ]:

Task 4: Numpy calculations#

[ ]:

x = np.array([[0.5, 1.5], [2.5, 3.5]])
y = np.diag([2, 3])
z = np.array([2, 3])

Do the following calculations with the arrays x, y, z

Do a matrix multiplication of the two arrays x and y
Do an elementwise multiplication of the matrices x and y
Do an elementwise addition x and z
Do an elementwise addition of x and z.reshape(-1, 1)
Describe the difference between the last two tasks.
Take the exponent of the array z
Sum the two rows in x

[ ]:

[ ]:

[ ]:

[ ]:

[ ]:

[ ]:

Task 5: File paths#

Define a path called ROOT that leads to the directory in which you store all materials for this course. Define the path relative to this notebook and then convert it to an absolute path. Note: The solution is different for everyone and depends on the directory structure you chose.

[ ]:

Define a path to this notebook and use it to proof that this notebook exists

[ ]:

Task 6: Read and save DataFrames#

[ ]:

iris = load_dataset("iris")
iris.head()

Save this dataset in each of the file formats presented in the slides. Then re-load them into a DataFrame using the corresponding read function

[ ]:

Look at the dataset you reloaded from the csv file. What do you see?

Task 7: Create Variables#

Add the square and the log of each numerical variable (i.e. all but “species”) in the dataset. Use “NAME_squared” and “log_NAME” as naming conventions. Do not type in variable lists.

[ ]:

Task 8: Select data#

Select all rows where the species is setosa and the sepal_length is greater or equal to 5

[ ]:

Task 9: Errors and Tracebacks#

Write me a message in zulip where you describe this error

[ ]:

df = pd.DataFrame(data=np.ones(2, 2), columns=["a", "b"])