Lab 1: First try on assignment 4

509cb8ee · Felix Ramnelöv · 888dfcdf · 509cb8ee
Commit 509cb8ee authored 5 months ago by Felix Ramnelöv
--- a/lab1/lab-notes.md
+++ b/lab1/lab-notes.md
@@ -2,9 +2,10 @@

 ## Assignment 1

-Confusion matrix o missclassification error e framtana.
+Confusion matrix o misclassification error e framtana.

-2. Comment: The confusion matrix looks good. Hardest for 1, 7, 8 and 9 (especially 9).
+2. Comment: The confusion matrix looks good. Hardest for 1, 7, 8 and 9
+   (especially 9).

   Confusion matrices:

@@ -36,22 +37,34 @@ Confusion matrix o missclassification error e framtana.
     | **8** | 0 | 7 | 0 | 1 | 0 | 0 | 0 | 0 | 70 | 0 |
     | **9** | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 85 |

-   Missclassification errors:
+   Misclassification errors:

   - $E_{\text{mis,train}} = 0.04500262$
   - $E_{\text{mis,test}} =0.05329154$

-3. Comment: The easy cases were easy to recognize visually while the hard ones were hard to recognize.
+3. Comment: The easy cases were easy to recognize visually while the
+   hard ones were hard to recognize.

-4. The complexity is the highest when $k$ is the lowest and decreases when we increase $k$ (as seen in the graph when the training error increases with an increasing $k$). Optimal $k$ when the validation error is minimum, when $k = 3$.
+4. The complexity is the highest when $k$ is the lowest and decreases
+   when we increase $k$ (as seen in the graph when the training error
+   increases with an increasing $k$). Optimal $k$ when the validation
+   error is minimum, when $k = 3$.

-   ![Missclassification rate depending on k](./assignment1-4.png)
+   ![Misclassification rate depending on k](./assignment1-4.png)

   Formula: $$R(Y, \hat{Y}) = \frac{1}{N} \sum_{i=1}^{N} I(Y_i \neq \hat{Y}_i)$$

-   Test error ($k = 3$): $E_{\text{mis,test}} = 0.02403344$. Higher than the training error but slightly lower than the validation error. According to us it is a pretty good model considering that it correct $\approx 98 \%$ of times.
+   Test error ($k = 3$): $E_{\text{mis,test}} = 0.02403344$. Higher
+   than the training error but slightly lower than the validation error.
+   According to us it is a pretty good model considering that it
+   correct $\approx 98 \%$ of times.

-5. Optimal $k = 6$, when the average cross-entropy loss is the lowest. Average cross-entropy loss takes probabilities in the prediction into account which is a better represntation of a model with multionmial distribution. An important aspect is that we can determina how wrong a classification is, not just wether it is wrong or not.
+5. Optimal $k = 6$, when the average cross-entropy loss is the lowest.
+   Average cross-entropy loss takes probabilities in the prediction
+   into account which is a better represntation of a model with
+   multionmial distribution. An important aspect is that we can
+   determina how wrong a classification is, not just wether it is
+   wrong or not.

   ![Average cross-entropy loss depending on k](./assignment1-5.png)

@@ -59,7 +72,8 @@ Confusion matrix o missclassification error e framtana.

 ## Assignment 2

-2. In the estimation summary shown bellow we can see our features ordered by significance. Here DFA is the most significant.
+2. In the estimation summary shown bellow we can see our features
+   ordered by significance. Here DFA is the most significant.

   Estimation summary:

@@ -110,4 +124,46 @@ Confusion matrix o missclassification error e framtana.
     - $\text{MSE}_{\text{test}} = 0.967756869359676$
     - $df = 5.6439254878463$

-   $\lambda = 100$ seems to be the most suitable penalty parameter considering we are able to drop $df(1) - df(100) \approx 4$ degrees of freedom without any significant change in $\text{MSE}_{test}$.
+   $\lambda = 100$ seems to be the most suitable penalty parameter considering
+   we are able to drop $df(1) - df(100) \approx 4$ degrees of freedom without
+   any significant change in $\text{MSE}_{test}$.
+
+## Assignment 3
+
+## Assignment 4
+
+- _Why can it be important to consider various probability thresholds in the
+  classification problems, according to the book?_
+
+  - According to the book it is important to consider various probability
+    thresholds since, even though the threshold $r = 0.5$ given
+    $g(\bold{x}) = p(y=1|\bold{x})$ (the model provides a correct description
+    of the real-world class probabilities) will give the smallest number of
+    misslcassifications on average and therefore minimise the
+    _misclassification rate_, it may not allways be the most important aspect
+    of the classifier. There exist classification problems that are asymmetric
+    (it is more important to cerrectly predict some classes than others) or
+    imbalanced (the classes occur with very different frequencies). For such
+    problems, minimising the misclassification rate might not lead to a desired
+    performance. [p. 50]
+
+- _What ways of collecting correct values of the target variable for the
+  supervised learning problems are mentioned in the book?_
+
+  - There are several ways of collecting correct values of the target variable:
+    - Manual Labeling: Just a simple matter of recording $\bold{x}$ and $y$. [p. 14]
+    - Expert Labeling: In some applications, the output $y$ has to be created by
+      labelling of the training data inputs $\bold{x}$ by a domain expert. [p. 14]
+
+- _How can one express the cost function of the linear regression in the matrix
+  form, according to the book?_
+
+  - According to the book the cost function of a linear regression model
+
+    $$\mathbf{y=X\theta + \epsilon}$$
+
+    can in matrix for be expressed as
+
+    $$J(\theta) = \frac{1}{n} \sum_{i=1}^{n} \left( \hat{y}(\mathbf{x_i;\theta}) - y_i \right)^2 = \frac{1}{n} \| \mathbf{\hat{y} - y} \|^2_2 = \frac{1}{n} \| \mathbf{X \theta - y} \|^2_2 = \frac{1}{n} \| \mathbf{\epsilon} \|^2_2$$
+
+    where $\| \cdot \|_2$ denotes the usual Euclidean vector norm and $\| \cdot \|^2_2$ its square.