Skip to content
Snippets Groups Projects
Commit 509cb8ee authored by Felix Ramnelöv's avatar Felix Ramnelöv
Browse files

Lab 1: First try on assignment 4

parent 888dfcdf
No related branches found
No related tags found
No related merge requests found
......@@ -2,9 +2,10 @@
## Assignment 1
Confusion matrix o missclassification error e framtana.
Confusion matrix o misclassification error e framtana.
2. Comment: The confusion matrix looks good. Hardest for 1, 7, 8 and 9 (especially 9).
2. Comment: The confusion matrix looks good. Hardest for 1, 7, 8 and 9
(especially 9).
Confusion matrices:
......@@ -36,22 +37,34 @@ Confusion matrix o missclassification error e framtana.
| **8** | 0 | 7 | 0 | 1 | 0 | 0 | 0 | 0 | 70 | 0 |
| **9** | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 85 |
Missclassification errors:
Misclassification errors:
- $E_{\text{mis,train}} = 0.04500262$
- $E_{\text{mis,test}} =0.05329154$
3. Comment: The easy cases were easy to recognize visually while the hard ones were hard to recognize.
3. Comment: The easy cases were easy to recognize visually while the
hard ones were hard to recognize.
4. The complexity is the highest when $k$ is the lowest and decreases when we increase $k$ (as seen in the graph when the training error increases with an increasing $k$). Optimal $k$ when the validation error is minimum, when $k = 3$.
4. The complexity is the highest when $k$ is the lowest and decreases
when we increase $k$ (as seen in the graph when the training error
increases with an increasing $k$). Optimal $k$ when the validation
error is minimum, when $k = 3$.
![Missclassification rate depending on k](./assignment1-4.png)
![Misclassification rate depending on k](./assignment1-4.png)
Formula: $$R(Y, \hat{Y}) = \frac{1}{N} \sum_{i=1}^{N} I(Y_i \neq \hat{Y}_i)$$
Test error ($k = 3$): $E_{\text{mis,test}} = 0.02403344$. Higher than the training error but slightly lower than the validation error. According to us it is a pretty good model considering that it correct $\approx 98 \%$ of times.
Test error ($k = 3$): $E_{\text{mis,test}} = 0.02403344$. Higher
than the training error but slightly lower than the validation error.
According to us it is a pretty good model considering that it
correct $\approx 98 \%$ of times.
5. Optimal $k = 6$, when the average cross-entropy loss is the lowest. Average cross-entropy loss takes probabilities in the prediction into account which is a better represntation of a model with multionmial distribution. An important aspect is that we can determina how wrong a classification is, not just wether it is wrong or not.
5. Optimal $k = 6$, when the average cross-entropy loss is the lowest.
Average cross-entropy loss takes probabilities in the prediction
into account which is a better represntation of a model with
multionmial distribution. An important aspect is that we can
determina how wrong a classification is, not just wether it is
wrong or not.
![Average cross-entropy loss depending on k](./assignment1-5.png)
......@@ -59,7 +72,8 @@ Confusion matrix o missclassification error e framtana.
## Assignment 2
2. In the estimation summary shown bellow we can see our features ordered by significance. Here DFA is the most significant.
2. In the estimation summary shown bellow we can see our features
ordered by significance. Here DFA is the most significant.
Estimation summary:
......@@ -110,4 +124,46 @@ Confusion matrix o missclassification error e framtana.
- $\text{MSE}_{\text{test}} = 0.967756869359676$
- $df = 5.6439254878463$
$\lambda = 100$ seems to be the most suitable penalty parameter considering we are able to drop $df(1) - df(100) \approx 4$ degrees of freedom without any significant change in $\text{MSE}_{test}$.
$\lambda = 100$ seems to be the most suitable penalty parameter considering
we are able to drop $df(1) - df(100) \approx 4$ degrees of freedom without
any significant change in $\text{MSE}_{test}$.
## Assignment 3
## Assignment 4
- _Why can it be important to consider various probability thresholds in the
classification problems, according to the book?_
- According to the book it is important to consider various probability
thresholds since, even though the threshold $r = 0.5$ given
$g(\bold{x}) = p(y=1|\bold{x})$ (the model provides a correct description
of the real-world class probabilities) will give the smallest number of
misslcassifications on average and therefore minimise the
_misclassification rate_, it may not allways be the most important aspect
of the classifier. There exist classification problems that are asymmetric
(it is more important to cerrectly predict some classes than others) or
imbalanced (the classes occur with very different frequencies). For such
problems, minimising the misclassification rate might not lead to a desired
performance. [p. 50]
- _What ways of collecting correct values of the target variable for the
supervised learning problems are mentioned in the book?_
- There are several ways of collecting correct values of the target variable:
- Manual Labeling: Just a simple matter of recording $\bold{x}$ and $y$. [p. 14]
- Expert Labeling: In some applications, the output $y$ has to be created by
labelling of the training data inputs $\bold{x}$ by a domain expert. [p. 14]
- _How can one express the cost function of the linear regression in the matrix
form, according to the book?_
- According to the book the cost function of a linear regression model
$$\mathbf{y=X\theta + \epsilon}$$
can in matrix for be expressed as
$$J(\theta) = \frac{1}{n} \sum_{i=1}^{n} \left( \hat{y}(\mathbf{x_i;\theta}) - y_i \right)^2 = \frac{1}{n} \| \mathbf{\hat{y} - y} \|^2_2 = \frac{1}{n} \| \mathbf{X \theta - y} \|^2_2 = \frac{1}{n} \| \mathbf{\epsilon} \|^2_2$$
where $\| \cdot \|_2$ denotes the usual Euclidean vector norm and $\| \cdot \|^2_2$ its square.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment