Since we can rewrite the 𝐿^2 regularised linear regression formula to a form where non-linear transformations 𝝓(x) only appear via inner product, we do not have to design a 𝑑-dimensional vector 𝝓(x) and derive its inner product. Instead, we can just choose a kernel
𝜅(x, x') directly where the kernel is the inner product of two non-linear input transformations according to:
𝜅(x, x') = 𝝓(x)^T𝝓(x').
This is known as the kernel trick:
If x enters the model as 𝝓(x)^T𝝓(x') only, we can choose a kernel 𝜅(x, x') instead of chosing 𝝓(x). p. 194
- In the literature, it is common to see a formulation of SVMs that makes use of a hyperparameter. What is the purpose of this hyperparameter?
The purpose is to regularize. p. 211
- In neural networks, what do we mean by mini-batch and epoch?
We call a small subsample of data a mini-batch, which typically can contain 𝑛𝑏 = 10, 𝑛𝑏 = 100, or 𝑛𝑏 = 1 000
data points. One complete pass through the training data is called an epoch, and consequently consists of 𝑛/𝑛𝑏 iterations. p. 125