# Lecture 5
## Topic
In this lecture we work with huggingface datasets and look at tokenization
## Lecture Slides
```{raw} html
```
```{eval-rst}
:download:`Download the slides <_static/bld/pdfs/lecture_5.pdf>`
```
## Exercises
```{toctree}
---
maxdepth: 1
---
bld/notebooks/exercises/exercise_5.ipynb
bld/notebooks/solutions/exercise_5.ipynb
```
## Suggested Homework
- Practice downloading and inspecting datasest
- Practice `dataset.map`
- Keep practicing sklearn
## Additional materials
### Natural Language Processing with transformers
The book [Natural language processing with transformers](https://www.oreilly.com/library/view/natural-language-processing/9781098136789/) has served as basis for many of the course materials.
Chapter 2 covers the material of this lecture
### Video on huggingface datasets
### Some intuition for encoder models
Watch this only if it helps you to get a feeling for the last hidden states. We only
really look at this in a few weeks.