Skip to content
Snippets Groups Projects
Commit e3f9949b authored by Felix Ramnelöv's avatar Felix Ramnelöv
Browse files

Lab 1: Cleaned up notes

parent 9e38aaf8
No related branches found
No related tags found
No related merge requests found
......@@ -30,11 +30,13 @@ mse <- function(y, y_hat)
feature_cols <- colnames(X_train)
target_col <- colnames(train_scaled[5])
print(target_col)
formula <- as.formula(paste(target_col, "~", paste(feature_cols, collapse = " + "), "-1"))
model <- lm(formula, data = train_scaled)
model_summary <- summary(model)
print(model_summary)
theta <- model_summary$coefficients[, 1]
sigma = model_summary$sigma
......
......@@ -38,20 +38,20 @@ Confusion matrix o missclassification error e framtana.
Missclassification errors:
- **Training**: 0.04500262
- **Test**: 0.05329154
- $E_{\text{mis,train}} = 0.04500262$
- $E_{\text{mis,test}} =0.05329154$
3. Comment: The easy cases were easy to recognize visually while the hard ones were hard to recognize.
4. The complexity is the highest when k is the lowest and decreases when we increase k (as seen in the graph when the training error increases with an increasing k). Optimal k when the validation error is minimum, when k = 3.
4. The complexity is the highest when $k$ is the lowest and decreases when we increase $k$ (as seen in the graph when the training error increases with an increasing $k$). Optimal $k$ when the validation error is minimum, when $k = 3$.
Formula: $R(Y, \hat{Y}) = \frac{1}{N} \sum_{i=1}^{N} I(Y_i \neq \hat{Y}_i)$
![Missclassification rate depending on k](./assignment1-4.png)
Test error (k = 3): 0.02403344. Higher than the training error but slightly lower than the validation error. According to us it is a pretty good model considering that it correct ~98% of times.
Test error ($k = 3$): $0.02403344$. Higher than the training error but slightly lower than the validation error. According to us it is a pretty good model considering that it correct $\approx 98 \%$ of times.
5. Optimal k = 6, when the average cross-entropy loss is the lowest. Average cross-entropy loss takes probabilities in the prediction into account which is a better represntation of a model with multionmial distribution. An important aspect is that we can determina how wrong a classification is, not just wether it is wrong or not.
5. Optimal $k = 6$, when the average cross-entropy loss is the lowest. Average cross-entropy loss takes probabilities in the prediction into account which is a better represntation of a model with multionmial distribution. An important aspect is that we can determina how wrong a classification is, not just wether it is wrong or not.
Formula: $R(Y, \hat{p}(Y)) = - \frac{1}{N} \sum_{i=1}^{N} \sum_{m=1}^{M} I(Y_i = C_m) \log \hat{p}(Y_i = C_m)$
......@@ -59,9 +59,55 @@ Confusion matrix o missclassification error e framtana.
## Assignment 2
2. DFA är viktigast men även PPE och HNR i minskande ordning. Framförallt DFA
3.
4. lambda = 1 gives error in theta parameters, mse = 0,87868, df = 13,86. lambda = 100 => mse = 0,8897, df = 9,92. lambda = 1000 => mse = 0,9399 , df = 5,64.
test: lambda = 1 : mse = 0,9347, lambda = 100: 0,9341, lambda = 1000: mse = 0,9678. 100 verkar va bästa penaltyn då det minska inte testfelet med lägre degrees of freedom, enlgit tetningsdatan.
2. In the estimation summary shown bellow we can see our features ordered by significance. Here DFA is the most significant.
Estimation summary:
| Coefficient | Estimate | Std. Error | t value | Pr(>\|t\|) |
| ------------- | ---------- | ---------- | ------- | ---------- |
| DFA | -0.280318 | 0.020136 | -13.921 | < 2e-16 |
| PPE | 0.226467 | 0.032881 | 6.887 | 6.70e-12 |
| HNR | -0.238543 | 0.036395 | -6.554 | 6.41e-11 |
| Shimmer.APQ11 | 0.305546 | 0.061236 | 4.990 | 6.34e-07 |
| Jitter.Abs. | -0.169609 | 0.040805 | -4.157 | 3.31e-05 |
| NHR | -0.185387 | 0.045567 | -4.068 | 4.84e-05 |
| Shimmer.APQ5 | -0.387507 | 0.113789 | -3.405 | 0.000668 |
| Shimmer | 0.592436 | 0.205981 | 2.876 | 0.004050 |
| RPDE | 0.004068 | 0.022664 | 0.179 | 0.857556 |
| Shimmer.APQ3 | 32.070932 | 77.159242 | 0.416 | 0.677694 |
| Shimmer.DDA | -32.387241 | 77.158814 | -0.420 | 0.674695 |
| Jitter.RAP | -5.269544 | 18.834160 | -0.280 | 0.779658 |
| Jitter.DDP | 5.249558 | 18.837525 | 0.279 | 0.780510 |
| Jitter.PPQ5 | -0.074568 | 0.087766 | -0.850 | 0.395592 |
| Jitter... | 0.186931 | 0.149561 | 1.250 | 0.211431 |
| Shimmer.dB. | -0.172655 | 0.139316 | -1.239 | 0.215315 |
Mean square error:
- Formula: $$\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$$
- $\text{MSE}_{\text{train}} = 0.878543102826276$
- $\text{MSE}_{\text{test}} = 0.935447712156739$
3. The functions where implemented using the following formulas:
- _Loglikelihood_: $$\log P(T | \theta, \sigma) = -\frac{n}{2} \log(2 \pi \sigma^2) - \frac{1}{2 \sigma^2} \sum_{i=1}^{n} (T_i - \mathbf{X}_i \boldsymbol{\theta})^2$$
- _Ridge_: $$\mathcal{L}_{\text{ridge}}(\theta, \sigma, \lambda) = \lambda \sum_{j=1}^{p} \theta_j^2 - \log P(T | \theta, \sigma)$$
- _RidgeOpt_: $$\hat{\theta}, \hat{\sigma} = \arg \min_{\theta, \sigma} \mathcal{L}_{\text{ridge}}(\theta, \sigma, \lambda)$$
- _DF_: $$\text{df}(\lambda) = \text{tr}\left( X \left( X^T X + \lambda I \right)^{-1} X^T \right)$$
4. Optimal $\bold{\theta}$ for $\lambda \in \{1,100,1000\}$:
- $\lambda = 1$:
- $\text{MSE}_{\text{train}} = 0.878681448897974$:
- $\text{MSE}_{\text{test}} = 0.934684486872397$:
- $df = 13.8607362829965$
- $\lambda = 100$:
- $\text{MSE}_{\text{train}} = 0.889775499501371$
- $\text{MSE}_{\text{test}} = 0.934131808081541$
- $df = 9.92488712829542$
- $\lambda = 1000$:
- $\text{MSE}_{\text{train}} = 0.939949118364897$
- $\text{MSE}_{\text{test}} = 0.967756869359676$
- $df = 5.6439254878463$
$\lambda = 100$ seems to be the most suitable penalty parameter considering we are able to drop $df(1) - df(100) \approx 4$ degrees of freedom without any significant change in $\text{MSE}_{test}$.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment