DEV Community

How AI engines actually decide what to cite (ChatGPT, Perplexity, Gemini, AI Overviews)

ChatGPT: being known beats ranking

ChatGPT answers in two modes. Default mode answers from trained-in memory, no live web. Search mode browses and attaches citations. The key fact: when it browses, it cites only about 15% of the pages it pulls (AirOps study of 548k pages). And it names brands roughly 3x more often than it links them.

So two things get you in:

  • Entity strength. If you're a consistent entity across Wikipedia, Wikidata, Reddit and press, ChatGPT names you from memory without browsing at all. Being a known entity beats ranking #1 anywhere.
  • Allow OAI-SearchBot in robots.txt. It's separate from GPTBot (training). Block it and you vanish from ChatGPT Search. A lot of sites do this by accident.

Perplexity: it's mostly Reddit

Perplexity does live retrieval and grounds every answer in sources. Its defining trait: it leans on community content hard. One 2025 study found Reddit was its most-cited source, ~47% of top citations.

It also rewards answer-first pages, because its reranker scores for how cleanly it can extract a passage. A page can rank #1 on Google and never get cited here if the answer is buried.

Gemini: it's basically Google + the Knowledge Graph

Gemini is the only major assistant running on Google's own live index plus the Knowledge Graph. So classical SEO is the floor, not optional.

The twist: ranking #1 isn't enough anymore. Only about 38% of Google's AI Overview citations come from the top 10 results, down from ~76% a year earlier. It pulls from deeper now, via sub-queries.

Google AI Overviews: authority over freshness

AI Overviews uses "query fan-out" - it splits your question into 8-12 sub-queries and pools the results. Most citations come from below position #1 (roughly 63% from below the top 10).

And counterintuitively, it has the weakest freshness bias of the major engines. Established, authoritative pages keep getting cited even without recent updates, which is the opposite of ChatGPT and Perplexity.

What this means if you're building something

  • Lead every page with the answer in the first few lines. Most AI citations come from the top of the page.
  • Be a real entity (Wikipedia, Wikidata, Crunchbase, consistent name everywhere).
  • Let the AI crawlers in. Check robots.txt for OAI-SearchBot, PerplexityBot, Google-Extended.
  • Show up off your own site - Reddit and YouTube get cited constantly.
  • Track over time, not off one screenshot. These answers are non-deterministic; the same prompt gives different brands run to run.

I got tired of checking this by hand, so I built FixAEO - a free tool to see how AI engines describe and recommend your brand across 8 engines, plus a free llms.txt validator. Sharing in case it saves you the manual prompting.

What have you noticed about getting cited by AI? Curious if others are seeing the same patterns.

Comments

No comments yet. Start the discussion.