AEO 101Single source of truth on AEO
AI Visibility10 min read

Where Do AI Citations Come From? The Data

Subia Peerzada

Subia Peerzada

Founder, Cite Solutions · July 2, 2026

Most teams work backwards on AI citations. They chase Reddit threads and press mentions before they have fixed the pages they own outright. The data says that is the wrong order.

Where do AI citations come from? Not from the places most of the GEO advice tells you to chase. A new dataset of 6.8 million citations puts hard numbers on it, and the numbers point almost entirely at content you already control.

Where AI citations come from, in one number

Most AI citations come from sources you already control. In Yext's study of 6.8 million citations, 44% pointed to brand-owned websites and 42% to listings a brand can claim and edit. That is 86% sitting in tiers a company can change directly. Reviews and social took 8%. News and forums took just 6%.

Yext analyzed 6.8 million citations across 1.6 million AI responses from ChatGPT, Gemini, and Perplexity, published in its October 2025 release. It is the largest public breakdown of AI citations by how much control the brand has over the source, and the headline is blunt.

The sources you control are the sources AI cites. That is the whole finding.

The four source tiers, ranked by how much control you have

Yext grouped every citation into four tiers based on one question: how much say does the brand have over what that source says? Ranked from most control to least, here is where the 6.8 million citations landed.

Yext — 6.8 million AI citations, 1.6M responses, ChatGPT · Gemini · Perplexity (Oct 2025)

86% of AI citations come from sources a brand controls.

Grouped by how much say you have over the source. The top two tiers are yours to fix today. The bottom two you can only earn.

44%
Your own websitesFull control

Product pages, docs, pricing, your blog. You own the words.

42%
Listings & profilesControllable

Google Business, G2, Capterra, app stores. You claim and edit them.

8%
Reviews & socialInfluenceable

Review sites and social posts. You shape them, you do not own them.

6%
News & forumsUncontrollable

Press coverage and forum threads like Reddit. You earn a mention at best.

Websites plus listings account for 86% of every citation. The sources most teams chase last, forums and press, are the smallest slice at 6% combined.

Tier 1: Your own websites earn 44% of citations

Your product pages, documentation, pricing page, and blog are the single largest citation source in AI answers. You own every word on them. When an engine grounds an answer in a fact about your product, it reaches for your site first more often than any other source type. This is the tier with the most volume and the most control, and it is the one most teams under-invest in relative to its payoff.

Tier 2: Listings and profiles earn 42%

Listings are the properties you do not host but can claim and edit: Google Business Profile, G2, Capterra, app-store pages, Crunchbase. Together they nearly match your own site for citation volume. The engine treats a claimed, complete profile as a corroborating source, a second place the same fact appears. An abandoned or half-filled profile is a citation slot you are handing to a competitor.

Tier 3: Reviews and social earn 8%

Review sites and social posts sit in the influence tier. You cannot dictate what a reviewer writes, but you can prompt reviews, respond to them, and keep your social presence current. This tier is smaller than the first two by a wide margin, and it swings by industry: in Yext's food-service cut, reviews and social hit 13.3%, the highest of any vertical.

Tier 4: News and forums earn 6%

Press coverage and forum threads like Reddit are the uncontrollable tier. You can earn a mention through genuine work, but you cannot claim or edit these sources. This is the smallest slice in the entire dataset. It is also, not coincidentally, where a lot of GEO advice tells brands to spend first.

Here is the table version, with what each tier translates to as a to-do.

TierSource typeShare of citationsWhat you actually do about it
Full controlYour websites44%Write clean, current, extractable pages
ControllableListings and profiles42%Claim and complete every profile
InfluenceableReviews and social8%Prompt reviews, stay active, respond
UncontrollableNews and forums6%Earn mentions through real coverage

Chasing forum mentions before fixing your own pages is optimizing the 6% and ignoring the 86%.

See which sources AI cites instead of you

We run your real buyer questions through ChatGPT, Gemini, and Perplexity, then show you exactly which source tier is winning the citation slots you want.

Get an AI Visibility Audit

Why "Reddit is 22% of answers" and "forums are 6% of citations" are both true

This is where people get confused, because two credible datasets look like they contradict each other. Our own CITE Index finds Reddit appears in about 22% of AI answers. Yext finds forums are only 6% of citations. Both are right. They count different things.

An AI answer usually cites four to five sources. When our data says Reddit shows up in 22% of answers, it means roughly one answer in five contains at least one Reddit link, out of the five sources in that answer. When Yext says forums are 6% of citations, it is counting Reddit's share of the total citation pile, not the share of answers it touches.

Do the arithmetic and they reconcile. If Reddit is one of five sources in a fifth of answers, its share of all citations lands near 4 to 6%. Reddit shows up in a fifth of answers and still accounts for a twentieth of citations. Both are true.

The practical read: a relevant forum thread is a real place to surface, which is why Reddit does help AI citations on some engines. But it is one slot in a five-slot answer, and the other four slots skew heavily toward sources you own. We break down the full source mix in how AI decides which sources to cite, and the pattern holds: community and news are load-bearing, but they are the minority of a pie where owned and claimed sources dominate.

What this means if you sell B2B SaaS, not pizza

One honest caveat. Yext studied retail, financial services, healthcare, and food service, verticals where "listings" means Google Business Profile, MapQuest, and TripAdvisor. If you sell software, a map pin is not your listing.

The control framework still holds. What changes is which specific properties fill each tier.

For a local business, the tiers look like:

  • Websites: your site and location pages
  • Listings: Google Business Profile, Apple Maps, TripAdvisor
  • Reviews: Google reviews, Yelp
  • Forums: local subreddits, community boards

For a B2B SaaS brand, the tiers look like:

  • Websites: product pages, docs, pricing, changelog, your blog
  • Listings: G2, Capterra, TrustRadius, app marketplaces, Crunchbase
  • Reviews: G2 and Capterra review corpus, LinkedIn commentary
  • Forums: relevant subreddits, Hacker News, niche Slack and Discord archives

If you sell software, your listings are G2 and Capterra, not a map pin. The tier that earns 42% of citations does not disappear for B2B; it moves to the review and directory platforms buyers already trust. And the invisibility risk is real: a February 2026 GrackerAI benchmark found 73% of cybersecurity vendors got zero ChatGPT citations when buyers asked for vendor recommendations. Most of those brands had a website. Very few had the listings and structured pages the engine needed to quote.

How to earn more AI citations from the sources you control

The diagnosis leads straight to the fix. If 86% of citations come from your sites and your listings, the work is to make those two tiers impossible to skip. Here are the five moves, in priority order.

Move 1: State the answer on the page, near the top

The engine cites a passage, not a page. It wants a clean, self-contained chunk that answers the question in a few sentences, with the claim and its context in the same place. Put the direct answer in the first screen of every important page. Answers buried in paragraph nine get skipped for a competitor who leads with theirs.

Move 2: Keep the facts current

Freshness weighs more than most B2B teams expect. For anything that changes, pricing, comparisons, feature lists, the engine prefers a recently updated page over an authoritative but stale one. A changelog and a visible "last updated" date are not vanity. They are retrieval signals.

Move 3: Claim and complete every listing

Your G2, Capterra, and Crunchbase profiles are 42% of the citation opportunity. Fill them out completely, keep the product description and category accurate, and treat them as extensions of your own site. An incomplete profile is a source the engine cannot lean on, so it leans on a rival instead.

Move 4: Get the same facts corroborated off-site

A number that lives only on your domain reads as a marketing claim. The same number echoed on a review site, a directory, and a news write-up reads as a fact. Corroboration is why the top two tiers work together: your site states it, your listings confirm it, and the engine cites with more confidence.

Move 5: Measure per engine, not on average

Gemini favors first-party websites, ChatGPT leans on listings, and Perplexity spreads across mixed sources. A single visibility score hides that. Track which source tier wins your buyer questions on each engine, then fix the specific tier losing the slot. This per-engine, per-question view is most of what a managed GEO agency is actually for.

AI does not reward the loudest brand. It rewards the best-documented one.

FAQ

Where do AI citations come from?

From sources brands mostly control. In Yext's study of 6.8 million citations across ChatGPT, Gemini, and Perplexity, 44% came from brand-owned websites and 42% from listings a brand can claim and edit, so 86% sat in tiers a company can change directly. Reviews and social took 8%, and news and forums just 6%.

What sources does AI cite most often?

Brand-owned websites are the single largest source, at 44% of citations in Yext's dataset, followed closely by listings and profiles at 42%. The two together make up 86% of all AI citations. Forums like Reddit and news coverage, the sources many teams chase first, are the smallest tier at 6% combined.

Do AI models cite your own website?

Yes, more than any other source type. Brand websites earned 44% of citations in the 6.8 million analyzed, the largest single tier. The catch is that the engine cites a passage, not a page, so a site only wins that slot when its answer is stated plainly near the top, kept current, and cleanly structured.

Is Reddit important for AI citations or not?

It depends on how you count. Reddit appears in roughly 22% of AI answers in our CITE Index but accounts for only about 6% of total citations, because each answer blends four to five sources and most of them are owned or claimed. A relevant thread is a real place to surface, but it is one slot in a five-slot answer.

Where should a brand invest first to get cited by AI?

Start with the 86% you control: your own pages and your listings. Make key pages answer-first and current, then claim and complete every profile on G2, Capterra, and the directories your buyers trust. Earn forum and press mentions after those two tiers are solid, not before. Run an AI visibility audit to see which tier is losing your slots today.

The bottom line

The 6.8 million citations say the same thing our own data does. AI answers are built mostly from sources brands already own or can claim, and the forum-and-press tier that gets the most attention is the smallest slice on the board.

That is good news, because it means the highest-return work is also the work you have the most control over. Fix your own pages, claim your listings, corroborate the facts, and measure per engine. For a full breakdown of which domains win across verticals, our top-domains research maps the citation pool, and the CITE framework lays out how we run this as a standing program.

Turn the 86% you control into citations

Tell us your category and your top buyer questions. We map which source tier wins each answer, then fix the pages and listings that get you cited.

Talk to Cite Solutions

Ready to become the answer AI gives?

Book a 30-minute discovery call. We'll show you what AI says about your brand today. No pitch. Just data.

.md