The vanilla ice-cream of a neural networks.
<aside> 💡 Check out the Google Colab implementation as a multi-class classifier implemented in raw Numpy.
</aside>
I trained a few different neural networks and found that a (4-4-4) architecture with 2-hidden ReLU units and a Softmax output unit performed best. I utilized $lr=3e^{-5}$ and $\lambda=0.2$ to yield these results.
Success! Quite a pretty loss curve.
Effective as a classifier.
We can also look at each specific class and understand how the decision boundaries are set:
Class 0: We can see that given the outlier blue point, the gradient has reduced in the top-right corner.
Class 1: Fits the data - we can see the impact of the outlier blue point in the top-right here.
Class 2: Fits the data very well, can see the non-linearities at play.
Class 3: Fits data well. Interesting to see that where there is more overlap with the purple dots, that decision boundary is less steep than the split between yellow and green (which has more margin).
Overall, we can see the effect of the non-linearities at play that a neural-net allows for compared to a linear classifier like the logistic linear regression model that was trained on the same data.
Further detail on the implementation can be found in specific write-ups for the following: