Reddit - r/MachineLearning

A map of the latest 11 million papers split by semantic similarity and time slices [P]

How I built it

I sourced the latest 11 million papers from OpenAlex and Arxiv and encoded them using SPECTER 2 on titles and abstracts, then projected them down to 2D using UMAP. I created labels within Voronoi bounds around high-density peaks at increasingly deep depths.

There is also support for both keyword and semantic queries, and there's an analytics layer for ranking institutions, authors, and topics.

I have also more recently added the ability to slide back and forth in time and a daily auto-ingestion script to ensure the map is up to date.

Feedback or suggestions are very welcome!

Comments

No comments yet. Start the discussion.