Lecture 5#

Topic#

In this lecture we work with huggingface datasets and look at tokenization

Lecture Slides#

Download the slides

Exercises#

Suggested Homework#

  • Practice downloading and inspecting datasest

  • Practice dataset.map

  • Keep practicing sklearn

Additional materials#

Natural Language Processing with transformers#

The book Natural language processing with transformers has served as basis for many of the course materials.

Chapter 2 covers the material of this lecture

Video on huggingface datasets#

Some intuition for encoder models#

Watch this only if it helps you to get a feeling for the last hidden states. We only really look at this in a few weeks.