3. Comment: The easy cases were easy to recognize visually while the hard ones were hard to recognize.
3. Comment: The easy cases were easy to recognize visually while the
hard ones were hard to recognize.
4. The complexity is the highest when $k$ is the lowest and decreases when we increase $k$ (as seen in the graph when the training error increases with an increasing $k$). Optimal $k$ when the validation error is minimum, when $k = 3$.
4. The complexity is the highest when $k$ is the lowest and decreases
when we increase $k$ (as seen in the graph when the training error
increases with an increasing $k$). Optimal $k$ when the validation
error is minimum, when $k = 3$.


Test error ($k = 3$): $E_{\text{mis,test}} = 0.02403344$. Higher than the training error but slightly lower than the validation error. According to us it is a pretty good model considering that it correct $\approx 98 \%$ of times.
Test error ($k = 3$): $E_{\text{mis,test}} = 0.02403344$. Higher
than the training error but slightly lower than the validation error.
According to us it is a pretty good model considering that it
correct $\approx 98 \%$ of times.
5. Optimal $k = 6$, when the average cross-entropy loss is the lowest. Average cross-entropy loss takes probabilities in the prediction into account which is a better represntation of a model with multionmial distribution. An important aspect is that we can determina how wrong a classification is, not just wether it is wrong or not.
5. Optimal $k = 6$, when the average cross-entropy loss is the lowest.
Average cross-entropy loss takes probabilities in the prediction
into account which is a better represntation of a model with
multionmial distribution. An important aspect is that we can
determina how wrong a classification is, not just wether it is
wrong or not.

...
...
@@ -59,7 +72,8 @@ Confusion matrix o missclassification error e framtana.
## Assignment 2
2. In the estimation summary shown bellow we can see our features ordered by significance. Here DFA is the most significant.
2. In the estimation summary shown bellow we can see our features
ordered by significance. Here DFA is the most significant.
Estimation summary:
...
...
@@ -110,4 +124,46 @@ Confusion matrix o missclassification error e framtana.
- $\text{MSE}_{\text{test}} = 0.967756869359676$
- $df = 5.6439254878463$
$\lambda = 100$ seems to be the most suitable penalty parameter considering we are able to drop $df(1) - df(100) \approx 4$ degrees of freedom without any significant change in $\text{MSE}_{test}$.
$\lambda = 100$ seems to be the most suitable penalty parameter considering
we are able to drop $df(1) - df(100) \approx 4$ degrees of freedom without
any significant change in $\text{MSE}_{test}$.
## Assignment 3
## Assignment 4
- _Why can it be important to consider various probability thresholds in the
classification problems, according to the book?_
- According to the book it is important to consider various probability
thresholds since, even though the threshold $r = 0.5$ given
$g(\bold{x}) = p(y=1|\bold{x})$ (the model provides a correct description
of the real-world class probabilities) will give the smallest number of
misslcassifications on average and therefore minimise the
_misclassification rate_, it may not allways be the most important aspect
of the classifier. There exist classification problems that are asymmetric
(it is more important to cerrectly predict some classes than others) or
imbalanced (the classes occur with very different frequencies). For such
problems, minimising the misclassification rate might not lead to a desired
performance. [p. 50]
- _What ways of collecting correct values of the target variable for the
supervised learning problems are mentioned in the book?_
- There are several ways of collecting correct values of the target variable:
- Manual Labeling: Just a simple matter of recording $\bold{x}$ and $y$. [p. 14]
- Expert Labeling: In some applications, the output $y$ has to be created by
labelling of the training data inputs $\bold{x}$ by a domain expert. [p. 14]
- _How can one express the cost function of the linear regression in the matrix
form, according to the book?_
- According to the book the cost function of a linear regression model