GEO

How to get cited by ChatGPT, Perplexity & AI Overviews

SEOany · June 28, 2026 · 8 min read

Generative engines are becoming a new front page — one you don't rank on, you get quoted on. This is our honest playbook for [GEO](/glossary#geo): the specific, controllable things that make ChatGPT, Perplexity and Google AI Overviews more likely to cite your page. No one can promise a citation. What you can do is earn eligibility and write passages a model wants to lift — and that is exactly what the sections below cover.

Why do ChatGPT, Perplexity and AI Overviews cite some pages and not others?

Generative engines cite the page that most cleanly answers a sub-question inside a source they have already retrieved. Getting cited is two problems stacked together: first being in the set of sources the model reads, then being the clearest, most quotable line in that set. You cannot buy a citation — you earn eligibility.

Every engine runs a different pipeline, but they share a shape: retrieve a handful of candidate sources, then synthesize an answer from them. ChatGPT search and Perplexity fetch live pages at query time; Google AI Overviews draw from Google's existing index. Your job is to be a strong candidate in whichever retrieval layer feeds the answer.

The retrieval layer is an invisible filter that most content advice ignores. If your page is un-crawlable, slow, or too thin to match the query, it never enters the candidate set, and no amount of elegant writing rescues it. That is why GEO sits on top of solid technical SEO, not instead of it.

So here is the honest bottom line, stated once: no tool or agency can guarantee a specific citation in a specific answer — the surface is probabilistic and it changes weekly. What you actually control is eligibility, clarity and consistency, and those are the three levers the rest of this guide pulls.

  • ChatGPT search — live retrieval at query time, favours pages it can fetch and parse fast.
  • Perplexity — live retrieval with visible citations, rewards sources that answer directly.
  • Google AI Overviews — synthesised from Google's index, so classic indexation still gates you.

How do you become the canonical source for a claim?

Publish one authoritative page per claim, own the specific numbers or definitions on it, and keep it current. Generative engines prefer a single stable first-hand source over a dozen paraphrases of it. Being the canonical source means being the page others cite — the origin of the fact, not its echo.

One claim deserves one home. Splitting a topic across five thin, overlapping pages divides your signal and confuses retrieval about which URL to trust; consolidating into one deep page concentrates it. This is the same instinct behind a canonical URL, applied to whole topics rather than duplicate addresses.

First-hand material beats aggregation. Original benchmarks, a clear definition you coined, primary data, or a named methodology give an engine something it can only get from you — which is exactly what makes your page worth citing rather than skipping.

Freshness is part of authority. Visible publish and updated dates, and content that reflects the current state of a fast-moving topic, signal that your page is the live source rather than a stale copy — and AI answers lean noticeably toward recency on anything time-sensitive.

  • One deep page per claim, not five thin ones.
  • At least one thing only you have: original data, a coined definition, a named method.
  • A visible last-updated date and genuinely current facts.
  • A clean, stable URL you don't move or fragment.

What makes a passage quotable enough to extract?

A quotable passage answers one question completely in 40 to 60 words, needs no surrounding context, and states a single claim in plain language. Models lift self-contained sentences, not paragraphs that depend on the three above them. Write so any passage still makes sense the moment it is copied out of your page.

Lead with the answer, then explain — the inverted-pyramid habit. Put the direct 40-to-60-word response immediately under a heading, and follow with the nuance; that top block is the piece most likely to be extracted verbatim into an answer.

One claim per paragraph keeps each unit extractable. When a paragraph argues three things at once, a model cannot cleanly quote any one of them, so it moves on to a source that made the point in a single clean sentence.

Phrase headings as the questions people actually ask. Retrieval matches queries against your headings, so an H2 like 'How do you become the canonical source?' aligns far better than a clever but vague label — and it hands the model a ready-made question-and-answer pair.

  • 40–60 words of direct answer right under each heading.
  • One claim per paragraph, in plain declarative sentences.
  • No 'as noted above' — every passage stands alone.
  • Question-shaped headings that mirror real queries.

How does entity consistency get your brand into answers?

Generative engines can cite your brand only after they resolve it to an entity — a distinct thing they can reason about. Consistent naming, category and facts across the web merge into one strong, citable entity; inconsistent signals split you into two weak ones that neither engine trusts.

Pick one name, one spelling and one home domain, and use them everywhere. When some pages say one thing and others a variant, you fragment your own entity and dilute every signal that points at it — consistency is the cheapest authority you will ever buy.

Give machines unambiguous facts with schema markup. Organization and Product structured data spell out your name, category and relationships in a format engines read directly, turning prose an AI has to infer into facts it can quote with confidence.

Connect your entity to the wider graph with sameAs links to authoritative profiles — your Wikipedia or Wikidata entry, Crunchbase, LinkedIn, or an official social account. These are the corroborating references that let an engine confirm you are who you say you are before it repeats your claim.

  • One brand name, one spelling, one canonical domain — everywhere.
  • Organization + Product schema on the pages that define you.
  • sameAs links to Wikidata, Crunchbase, LinkedIn and official profiles.
  • The same short description reused across your key pages.

Does llms.txt actually help you get cited?

llms.txt is a plain-text file at /llms.txt that tells AI crawlers what your site is and which pages matter most. It is cheap to maintain and adoption is climbing, but no engine promises to read it — treat it as helping the right pages get discovered, not as a switch that guarantees citations.

The idea borrows from robots.txt but inverts the intent: instead of blocking crawlers, llms.txt curates a short list of your most important, most quotable pages so a model spends its attention where you want it. Our own file lives at /llms.txt if you want to see the format.

Be honest about its limits. It is a young convention, support across engines is uneven, and it will never rescue thin content or override a page that simply is not crawlable — it is a signpost, not a ranking lever, and it works only when the pages it points to are already worth citing.

If you want the full rationale, format and a copy-paste template, we wrote a dedicated guide: what is llms.txt. The five-minute version: list your canonical pages, describe each in one line, and keep it in sync with the pages you most want quoted.

How do you know if any of this is working?

You measure citations, not rankings. Track how often ChatGPT, Perplexity and AI Overviews name or link your brand for your target prompts, watch whether you appear in their cited sources, and treat AI-referral traffic as the slow lagging signal. Re-test monthly, because the answer surface shifts fast.

Start with prompt testing. Pick the questions a buyer would actually ask, run them across each engine on a schedule, and log whether you are cited, mentioned, or absent — that panel is your real GEO ranking report, and it is the only one that reflects the surface your customers see.

Watch referral traffic from AI engines in your analytics as a second, lagging signal. It arrives later and noisier than classic search clicks, so read the trend over weeks rather than reacting to any single day.

Set expectations accordingly: GEO feedback loops are slower and noisier than rank tracking, and citations come and go as models refresh. Judge the programme on direction over a quarter — more prompts where you are cited, in more engines — not on any one answer.

  • A fixed prompt set, tested across engines on a monthly cadence.
  • Cited / mentioned / absent logged per prompt, per engine.
  • AI-referral traffic tracked as a lagging trend, not a daily metric.
  • Direction over a quarter, not the result of a single answer.

Let the agent run this playbook for you

Start free