diff --git a/lab1/assignment3.R b/lab1/assignment3.R index 40a5cd5648383db1ca690c4f1e88944f34d58df0..a2fb635ed357270a01ef166d212f3812655b98cf 100644 --- a/lab1/assignment3.R +++ b/lab1/assignment3.R @@ -39,7 +39,7 @@ missclass_rate = missclass(data$V9, predict_reg) print(missclass_rate) plot( - main = "Plasma Glucose Concentration vs Age", + main = "Plasma Glucose Concentration vs Age (r=0.5)", data$V2, data$V8, xlab = "Plasma Glucose Concentration", @@ -92,7 +92,7 @@ predict_reg2 <- ifelse(predict_reg2 > r, 1, 0) theta2 <- coefficients(model2) plot( - main = "Plasma Glucose Concentration vs Age", + main = "Plasma Glucose Concentration vs Age (r=0.5)", data$V2, data$V8, xlab = "Plasma Glucose Concentration", diff --git a/lab1/figures/assignment3-2.eps b/lab1/figures/assignment3-2.eps index 6742697d6829bbe10982ebaa00389786d6600b36..c3dcf68261564c24d3871f2b18bbf2eddf5d2272 100644 --- a/lab1/figures/assignment3-2.eps +++ b/lab1/figures/assignment3-2.eps @@ -1963,8 +1963,8 @@ cp p1 0.00 0.00 352.50 282.00 cl /Font2 findfont 14 s 0 0 0 srgb -62.28 247.45 (Plasma Glucose Concentration vs Ag) 0 ta -0.140 (e) tb gr +39.13 247.45 (Plasma Glucose Concentration vs Ag) 0 ta +0.140 (e \(r=0.5\)) tb gr /Font1 findfont 12 s 107.95 18.72 (Plasma Glucose Concentr) 0 ta -0.120 (ation) tb gr diff --git a/lab1/figures/assignment3-2.png b/lab1/figures/assignment3-2.png index 63c84180c7ac25eb8bdc60dd87477602ee8e3fbf..b32550c32d0a54edefbeed1ec246e3f6dc4a3f42 100644 Binary files a/lab1/figures/assignment3-2.png and b/lab1/figures/assignment3-2.png differ diff --git a/lab1/figures/assignment3-3.eps b/lab1/figures/assignment3-3.eps index cf058777a7a880fa3a559446167cb5d1690d5474..b33ab2c45218a4e481ee1d95d0fde0f8f55ed026 100644 --- a/lab1/figures/assignment3-3.eps +++ b/lab1/figures/assignment3-3.eps @@ -1963,8 +1963,8 @@ cp p1 0.00 0.00 352.50 282.00 cl /Font2 findfont 14 s 0 0 0 srgb -62.28 247.45 (Plasma Glucose Concentration vs Ag) 0 ta -0.140 (e) tb gr +39.13 247.45 (Plasma Glucose Concentration vs Ag) 0 ta +0.140 (e \(r=0.5\)) tb gr /Font1 findfont 12 s 107.95 18.72 (Plasma Glucose Concentr) 0 ta -0.120 (ation) tb gr diff --git a/lab1/figures/assignment3-3.png b/lab1/figures/assignment3-3.png index 6773094ff3e0d70f9a23a0c62bf2043da6bb8f7d..f0149f0f3eb00f83f6b7ef87ee20f66e142c660e 100644 Binary files a/lab1/figures/assignment3-3.png and b/lab1/figures/assignment3-3.png differ diff --git a/lab1/figures/assignment3-5.eps b/lab1/figures/assignment3-5.eps index 6bb262519e94487e95c40882acad5901fb4646a4..cb4bc61d0e87b523ad80fb4a9e9a169536fca6ea 100644 --- a/lab1/figures/assignment3-5.eps +++ b/lab1/figures/assignment3-5.eps @@ -2063,8 +2063,8 @@ cp p1 0.00 0.00 352.50 282.00 cl /Font2 findfont 14 s 0 0 0 srgb -62.28 247.45 (Plasma Glucose Concentration vs Ag) 0 ta -0.140 (e) tb gr +39.13 247.45 (Plasma Glucose Concentration vs Ag) 0 ta +0.140 (e \(r=0.5\)) tb gr /Font1 findfont 12 s 107.95 18.72 (Plasma Glucose Concentr) 0 ta -0.120 (ation) tb gr diff --git a/lab1/figures/assignment3-5.png b/lab1/figures/assignment3-5.png index 1b5cc04c0ffcc3fe3063d126438e0cf75813ecd7..32d6a4300f1d176807c4ed8b6e96d4a371ca56f7 100644 Binary files a/lab1/figures/assignment3-5.png and b/lab1/figures/assignment3-5.png differ diff --git a/lab1/lab-notes.md b/lab1/lab-notes.md index 6314f64ca4bad3ab110f4c8e7ec86ae521036090..7c5dc7598483e7f6780830327aa4e6d4222eb39b 100644 --- a/lab1/lab-notes.md +++ b/lab1/lab-notes.md @@ -130,41 +130,59 @@ Confusion matrix o misclassification error e framtana. ## Assignment 3 -1. Probably not very easy to classify due to a lot of overlapping. We can not separate the two classes with a signle line. +1. Probably not very easy to classify due to a lot of overlapping. We can not separate the two classes with a single line.  -2. The probibalistic equation: +2. The probibalistic equation for diabetes: $$p(y = 1 \mid \mathbf{x}^*) = g(\mathbf{x}^*, \boldsymbol{\theta}) = \frac{1}{1 + e^{-\boldsymbol{\theta}^\top \mathbf{x}^*}}$$ - Transform into decision: + Consequently the equation for no diabetes becomes: + + $$p(y = 0 \mid \mathbf{x}^*) = 1 - p(y = 1 \mid \mathbf{x}^*)$$ + + We transform the probability into a decision: $$ \hat{y} = \begin{cases} - 1 & \text{if } p(y = 1 \mid \mathbf{x}^*) > t \\ + 1 & \text{if } p(y = 1 \mid \mathbf{x}^*) > r \\ 0 & \text{otherwise} \end{cases} $$ - Normally, \( t = 0.5 \). +  + + $E_{\text{mis}}(r = 0.5) = 0.2630208$ + + Comment: Not a very good model since we missclassify every fourth descision. Note: $r = 0.5$ gives the lowest possible missclassification rate. + +3. The equation of the decision boundary between the two classes + + $$r = sigm(\mathbf{\theta}^T\mathbf{x}) = \frac{1}{1+e^{-\mathbf{\theta}^T\mathbf{x}}}$$ + + which results in + + $$\ln(\frac{r}{1-r}) = \mathbf{\theta}^T\mathbf{x}$$ + + We get the following boundry: - Wrong 1/4 of times, not very good. +  - 0.2630208 + This does not capture the original data distibution very well. -3. a. theta0 + theta1* C1 + theta2*C2 = 0 +4. Lower $r$ gives lower risk of missing patient with diabetes as shown in the plots bellow. - b. +  - Does not capture data distibution very well. +  -4. Lower r gives lower risk of missing patient with diabetes. +5. The model becomes more complex with lower missclassification rate, which would mean that it is better. -5. Model becomes more complex with lower missclassification rate. +  -0.2447917 + $E_{\text{miss}}(r=0.5) = 0.2447917$ ## Assignment 4