Newer
Older
Assignment 1
- What is the kernel trick?
Since we can rewrite the 𝐿^2 regularised linear regression formula to a form where non-linear transformations 𝝓(x) only appear via inner product, we do not have to design a 𝑑-dimensional vector 𝝓(x) and derive its inner product. Instead, we can just choose a kernel
𝜅(x, x') directly where the kernel is the inner product of two non-linear input transformations according to:
𝜅(x, x') = 𝝓(x)^T𝝓(x').
This is known as the kernel trick:
If x enters the model as 𝝓(x)^T𝝓(x') only, we can choose a kernel 𝜅(x, x') instead of chosing 𝝓(x). p. 194
- In the literature, it is common to see a formulation of SVMs that makes use of a hyperparameter. What is the purpose of this hyperparameter?
The hyperparameter C is the regularization term in the dual formulation of SVMs:
\[
\alpha = \arg \min_\alpha \left( \frac{1}{2} \alpha^T K(X, X) \alpha - \alpha^T y \right)
\]
\[
\text{subject to } \lvert \alpha_i \rvert \leq \frac{1}{2n\lambda} \quad \text{and} \quad 0 \leq \alpha_i y
\]
with \[y(x^\star) = \operatorname{sign} \left( b + \alpha^T K(X, x^\star) \right)\].
Here \[C = \frac{1}{2n\lambda}\]. p. 211
- In neural networks, what do we mean by mini-batch and epoch?
We call a small subsample of data a mini-batch, which typically can contain 𝑛𝑏 = 10, 𝑛𝑏 = 100, or 𝑛𝑏 = 1 000
data points. One complete pass through the training data is called an epoch, and consequently consists of 𝑛/𝑛𝑏 iterations. p. 125
Assignment 4
4.1
Results look good. reed curve is almost the same as blue. 10 hidden units seem to be quite suffiecient. Some off points between 5 and 7.
4.2
h1 gives a very bad predictions of e learned NN on the test data.
h2: The ReLU function does not have defined derivative when max(0,x) is used. Instead ifelse(x>0,x,0) is used. The prediction is quite good for Var < 4 but then off.
h3: Good predctions for all Var, but not as good as sigmoid as activation function.