# Questions

## Instructions

Please copy paste the following questions into the README.md file of your final project
and answer them in a concise way. A good answer will typically have between 20 and 100
words. Each answer gives you 1 point.

I will do an automated plagiarism test. So do not submit the exact same answers as someone else.


## Questions

1. List five different tasks that belong to the field of natural language processing.
2. What is the fundamental difference between econometrics/statistics and suprevised machine learning
3. Can you use stochastic gradient descent to tune the hyperparameters of a random forrest. If not, why?
4. What is imbalanced data and why can it be a problem in machine learning?
5. Why are samples split into training and test data in machine learning?
6. Describe the pros and cons of word and character level tokenization.
7. Why does fine-tuning usually give you a better performing model than feature extraction?
8. What are advantages over feature extraction over fine-tuning
9. Why are neural networks trained on GPUs or other specialized hardware?
10. How can you write pytorch code that uses a GPU if it is available but also runs on a laptop that does not have a GPU.
11. How many trainable parameters would the neural network in [this video](https://www.youtube.com/watch?v=aircAruvnKk&t=1s) have if we remove the second hidden layer but leave it otherwise unchanged.
12. Why are nonlinearities used in neural networks? Name at least three different nonlinearities.
13. Some would say that `softmax` is a bad name. What would be a better name and why?
14. What is the purpose of `DataLoaders` in pytorch?
15. Name a few different optimizers that are used to train deep neural networks
16. What happens when the batch size during the optimization is set too small?
17. What happens when the batch size diring the optimization is set too large?
18. Why can the feed-forward neural network we implemented for image classification not be used for language modelling?
19. Why is an encoder-decoder architecture used for machine translation (instead of the simpler encoder only architecture we used for language modelling)
20. Is it a good idea to base your final project on a paper or blogpost from 2015? Why or why not?
21. Do you agree with the following sentence: To get the best model performance, you should train a model from scratch in Pytorch so you can influence every step of the process.
22. What is an example of an encoder-only model?
23. What is the vanishing gradient problem and how does it affect training?
24. Which model has a longer memory: RNN or Transformer?
25. What is the fundamental component of the transformer architecture?