A map of the latest 11 million papers split by semantic similarity and time slices [P]
How I built it
I sourced the latest 11 million papers from OpenAlex and Arxiv and encoded them using SPECTER 2 on titles and abstracts, then projected them down to 2D using UMAP. I created labels within Voronoi bounds around high-density peaks at increasingly deep depths.
There is also support for both keyword and semantic queries, and there's an analytics layer for ranking institutions, authors, and topics.
I have also more recently added the ability to slide back and forth in time and a daily auto-ingestion script to ensure the map is up to date.
Feedback or suggestions are very welcome!
Comments
No comments yet. Start the discussion.