Using NLP to label the philosophy of journal entries
This was a personal project I’ve wanted to try out since understanding embedding models early on in my machine learning journey — always simmering on the backburner.
Then my Statistical NLP class required a final project.
So a sweaty 72 hours later…boom! 💥
When I first understood that you can generate semantically meaningful vectors over documents, I remember madly pacing around the little AirBnB I was studying out of in Kalamata.
What if you could use this to classify your personal philosophy based on journal entries? Woah, and if you could do that…you could track your philosophy over time! And then…
<aside> 💡
Wait, what are semantically meaningful vectors?
Deep learning has found that if you train on the right data with the right setup, you can learn vectors (think ordered lists of numbers - often 100+ elements long) that encode a useful representation which captures the meaning of each word. Using these and other techniques, you can generate semantically meaningful vectors over entire passages of text — that’s what this project uses.
This is a simplification — if you’re interested in learning more, I recreated Word2Vec and highlight some of these concepts in a bit more detail there.
</aside>
The fact that we are able to generate semantically rich embeddings over text — honestly, over anything (since we can do this on audio, images, etc.) — is remarkable. And remarkably powerful! So an open question that has always been nagging at me is testing the extent of these representations — how much information do they truly capture?
Well, this project has updated my beliefs: These embedding models are more powerful than I initially thought. I was a believer before — don't get me wrong — but I was positively surprised. Yet again deep learning shocks me with its capabilities.
This project was imperfect — but serves more as a proof of concept. There are many ways to improve performance, but I was pretty psyched about the result.
TLDR; I trained a classification model to label journal entries and essays on one of 7 schools of philosophical thought.
<aside> 🤖
Feel free to try it out on Colab or check out the technical report.
</aside>