3. Comment: The easy cases were easy to recognize visually while the hard ones were hard to recognize.
4. The complexity is the highest when k is the lowest and decreases when we increase k (as seen in the graph when the training error increases with an increasing k). Optimal k when the validation error is minimum, when k = 3.

Test error (k = 3): 0.02403344. Higher than the training error but slightly lower than the validation error. According to us it is a pretty good model considering that it correct ~98% of times.
5. Optimal k = 6, when the average cross-entropy loss is the lowest. Average cross-entropy loss takes probabilities in the prediction into account which is a better represntation of a model with multionmial distribution. An important aspect is that we can determina how wrong a classification is, not just wether it is wrong or not.