3. We get underfitting for small trees however we can still get good for validation due to it working as a regularization, we generalize well
- _Decision Tree with default settings_
High bias -> underfittning, high variance -> overfitting. We find the best amount of leaves when we balance bias and variance and minimising the deviance.
- $\text{MSE}_\text{train} = 0.1067$
- $\text{MSE}_\text{valid} = 0.1118484$
- No. terminal nodes: $9$
4. F1 better since very imbalanced
- _Decision Tree with smallest allowed node size equal to 7000_
5. More likely to choose no, needs to be 5 times more sure of yes than no to select yes
- $\text{MSE}_\text{train} = 0.1097$
- $\text{MSE}_\text{valid} = 0.1142078$
- No. terminal nodes: $5$
6. Tree a little bit better, precision recall would be better since imbalanced classes.
- _Decision trees minimum deviance to 0.0005_
Since we have a big imbalance, FPR is not even 50% for pi = 0.05.
- $\text{MSE}_\text{train} = 0.0778$
- $\text{MSE}_\text{valid} = 0.1015999$
- No. terminal nodes: $197$
It is evident that the tree with minimum deviance to 0.0005 is the best model due to having the lowest misclassification rate for validation data.
Changing the deviance to lower will yield a larger tree due to the stopping condition being more strict, craving more purity for the nodes.
Chaning the smallest allowed node size to a higher value will yield a smaller tree since we set are not allowing as complex tree.
A larger tree with more terminal nodes mean a more complex model.
3. As seen in the graph we get underfitting for trees with sizes up to 50 evident by the fact that we get a higher deviance for training data than validation data. We can still generalize better for validation data which is most likely due to correlation between features and limiting the amount of nodes works as regularization.

High bias gives underfittning and high variance gives overfitting. We find the best amount of leaves when we balance bias and variance and thus minimising the deviance.
Optimal number of leaves is found to be $42$. The most important variables seem to be _duration_, _poutcome_, _month_ and _day_, looking at the structure of the tree.
4. Confusion matrix:
| | no | yes |
| --- | ----- | --- |
| no | 11469 | 510 |
| yes | 833 | 752 |
Accuracy: $0.9009879$
F1-score: $0.5282754$
The model does not have very good predictive power. The accuracy is quite high but it does not take into account the fact that the data is very imbalanced. However F1-score does take it into account and gives a more realistic picture.
5. Confusion matrix:
| | no | yes |
| --- | ----- | --- |
| no | 11830 | 149 |
| yes | 1339 | 246 |
With the loss matrix we add an extra penalty for classifying _yes_. This means that we need to be five times as sure of _yes_ than _no_ to classify as _yes_. This is reflected in the confusion matrix as we get a higher rate of _no_-classifications.
This can be beneficial for assymetric classification problems where it would be more costly to calssify false positive than false negative.
6. ROC curve:

It is evident that the optimal tree is a little bit better than the logistic regression model.
A precision-recall curve would probably be a more suitable choice here since the classes are very imbalanced. As a result of the imbalance, FPR is not even $50\%$ for $\pi = 0.05$.