# Lecture 5 ## Topic In this lecture we work with huggingface datasets and look at tokenization ## Lecture Slides ```{raw} html ``` ```{eval-rst} :download:`Download the slides <_static/bld/pdfs/lecture_5.pdf>` ``` ## Exercises ```{toctree} --- maxdepth: 1 --- bld/notebooks/exercises/exercise_5.ipynb bld/notebooks/solutions/exercise_5.ipynb ``` ## Suggested Homework - Practice downloading and inspecting datasest - Practice `dataset.map` - Keep practicing sklearn ## Additional materials ### Natural Language Processing with transformers The book [Natural language processing with transformers](https://www.oreilly.com/library/view/natural-language-processing/9781098136789/) has served as basis for many of the course materials. Chapter 2 covers the material of this lecture ### Video on huggingface datasets ### Some intuition for encoder models Watch this only if it helps you to get a feeling for the last hidden states. We only really look at this in a few weeks.