AI Citation Concentration Is Worse Than PageRank

The Trade Press AI Index 2026 dropped on May 17. It synthesized six separate AI-citation studies, audited 680 million citations across ChatGPT, Claude, Perplexity, Gemini, and Google AI Overviews, and produced the cleanest concentration number we have so far.

Fifteen domains capture 68% of all AI citations. One domain, Wikipedia, accounts for 47.9% of ChatGPT's top-10 cited sources.

Google's PageRank-era top 15 never came close to that.

AI citation concentration vs traditional search

Share of total citations captured by the most-cited domains

Top 15 domains

68%

of all AI citations across ChatGPT, Claude, Perplexity, Gemini, and Google AI Overviews

Wikipedia on ChatGPT

47.9%

share of ChatGPT's top-10 cited sources captured by a single domain

Single-domain citation share on ChatGPT

Wikipedia

47.9%

Forbes

6.93%

Reddit (top web forum)

5.2%

PR Newswire

4.72%

Source: Trade Press AI Index 2026 (5W + Everything-PR, 680M citations audited)

If you are running a B2B SaaS visibility program against an open-web mental model, the math has already moved against you. The source pool is smaller than you think, the gatekeepers are older than you think, and the path into the pool runs through earned media, not your own site.

How concentrated is the AI citation pool, really

Across the five major AI surfaces, the top 15 domains capture 68% of every cited source. Wikipedia alone accounts for 47.9% of ChatGPT's top-10 references. Reddit is the most-cited single domain on every major engine, hitting roughly 40% citation frequency in aggregate. Forbes pulls 6.93% of ChatGPT citations. PR Newswire pulls 4.72%. The long tail below the top 15 fights over the remaining 32%.

AI citation share is the new market share. And the citation share is concentrating into 15 domains faster than Google's link graph ever did.

That number comes from Ronn Torossian's Trade Press AI Index 2026, published jointly by 5W and Everything-PR on May 17, 2026. It aggregates six LLM-citation studies released between August 2024 and May 2026.

For comparison, Peec AI's 30-million-source analysis from April found the same top-10 pattern: Reddit, YouTube, Wikipedia, LinkedIn, GitHub. Different audit, same shape.

The Wikipedia number is the part most teams miss

47.9% on ChatGPT means roughly half of every top-10 citation list pulled by GPT-5.5 has Wikipedia in it. Not "sometimes appears." Not "ranks well." Half. If your category has a thin or hostile Wikipedia entry, you are competing for the other half against Reddit, LinkedIn, Forbes, GitHub, and a few trade outlets.

Reddit is the second gatekeeper

Tinuiti's Q1 2026 AI Citation Trends Report put Reddit at over 5% of ChatGPT citations and roughly 24% of all Perplexity citations in January 2026. Perplexity drew 31% of its citations from social media that month. Gemini drew 0.1% from Reddit. So the same buyer, asking the same B2B question on two different engines, sees almost no source overlap.

The long tail is where most B2B brands actually live

If your domain is not in the top 15, you are inside a 32% slice fragmented across thousands of competitors. That is the position most B2B SaaS sites occupy right now, including ones with strong organic rankings on Google.

Why this concentration is more extreme than PageRank

PageRank concentrated authority, but it never produced a 47.9% single-domain dominance the way ChatGPT does with Wikipedia. Four reasons the AI pool concentrates harder than Google ever did.

Reason #1: LLMs were trained on a finite, biased web slice

Every major foundation model used a small set of high-quality corpora during pretraining. Wikipedia, Common Crawl top-tier subsets, Reddit submissions, Stack Overflow, and a handful of news archives appear in almost every training set. That bias survives post-training. ChatGPT will reach for Wikipedia in retrieval not because Wikipedia is the best source today, but because the model already knows the shape of a Wikipedia article and trusts it during synthesis.

Reason #2: Retrieval favors structured, citable passages

AI search engines do not rank pages. They extract passages and ground answers. Wikipedia, Reddit, and structured trade press write in the format passage extraction prefers: short paragraphs, clear claims, named entities. Most B2B SaaS marketing pages do the opposite. The result is a citation pool that rewards encyclopedic and forum content disproportionately.

Reason #3: Brand mentions outweigh on-site content

Brand search volume and off-site mentions correlate with AI citation frequency at 0.334 and 0.664 respectively. Backlinks and on-site content quality score lower. The AI ranking signal is largely a brand-trust signal, and the brand-trust signal is largely an earned-media signal. PageRank rewarded a clever site. AI citation rewards a brand that the open web is already talking about.

Reason #4: Citation pools are shrinking, not expanding

A 14-week Resoneo study, reported by Search Engine Journal, tracked ChatGPT's citation pool shrinking by 21% when GPT-5.3 Instant became the default. The number of cited domains per response fell from 19 to 15. Newer models cite fewer sources. That makes the concentration problem worse over time, not better.

Find out where you actually sit in the citation pool.

Most B2B SaaS sites live somewhere inside the 32% long tail without knowing what's pulling them in or out. A citation audit maps your share across all five major AI engines and the earned-media surfaces that feed them.

Book a citation audit

What concentration data means for B2B SaaS

If 68% of citations live in 15 domains, then a citation strategy that does not include those 15 domains is a content strategy.

Three operational implications fall out of that.

The owned-vs-earned ratio has to flip

Most B2B teams spend 80% of their content budget on owned media (blog, resource center, landing pages) and 20% on earned. The citation data says the reverse is closer to the right mix. 89% of AI citations come from earned media, not owned channels. Spending more on a blog you already publish twice a week will not move citation share.

Wikipedia and Reddit are not optional

Two of the top three citation domains are open, community-edited platforms. Most B2B SaaS marketing teams treat them as compliance risks and skip them entirely. The result is predictable: zero presence on the surfaces that account for over 50% of citations on ChatGPT and Perplexity.

The citation pool is platform-specific

Reddit is 24% of Perplexity citations and 0.1% of Gemini citations. LinkedIn is 15% of Google AI Mode citations. Wikipedia is 47.9% of ChatGPT's top-10. A single content asset will not get you cited everywhere. You need platform-specific source presence or you will show up on one engine and disappear on the next.

How to win citations in a concentrated source pool

Five steps to move from long-tail invisibility to top-15 adjacency. Each step is a separate project, not a campaign.

For each of the 15 domains (Wikipedia, Reddit, YouTube, LinkedIn, GitHub, Medium, Forbes, Quora, Stack Overflow, PR Newswire, HubSpot, Trustpilot, G2, Capterra, and a category-specific trade outlet), run a manual check on your brand presence. Count the assets that mention you, the assets where you are the primary subject, and the recency of those assets. This becomes your baseline.

Step 2: Fix the Wikipedia entry first

If your company has a Wikipedia page, audit it for accuracy and currency. If it does not have one and the company qualifies under Wikipedia's notability rules, work with an experienced editor (not an internal PR person) to draft one. Half of every ChatGPT top-10 citation reaches for Wikipedia. The cost-to-impact ratio of one corrected or created Wikipedia entry exceeds almost every other GEO action.

Step 3: Build a defensible Reddit presence

Identify three subreddits where your category lives. Spend three months contributing answers that do not mention your company. Build account history. Once the account has visible history and karma, you can answer category questions where mentioning your product is genuinely relevant. The Reddit-AEO playbook is slow and uncomfortable for most B2B teams, which is why most B2B teams have zero Reddit citation share.

Step 4: Place third-party comparisons and reviews on G2, Capterra, and Trustpilot

These are not in the top 15 by raw citation count, but they feed the top 15. AI engines pull comparison and pricing claims from review sites and then quote them in summary answers. A clean, current G2 profile with recent reviews changes what ChatGPT says about your product without you publishing anything new.

Step 5: Pitch trade press, not consumer press

Forbes Council content gets cited. PR Newswire wires get cited. Industry trade outlets that publish bylined practitioner content get cited. Consumer business press (TechCrunch, Fast Company, Inc.) gets cited at a much lower rate per pitch placed. Send a quarterly bylined piece to the two largest trade outlets in your category. That single placement will likely outperform a year of self-published thought leadership on the same topic.

Build your top-15 citation presence in 90 days.

We run citation share audits, Wikipedia readiness reviews, and earned-media plans that move B2B SaaS brands from long-tail invisibility into the source pool AI engines actually cite. No content factory. No backlink farm.

Talk to a strategist

What traditional PR asks vs what AI citation asks

A useful contrast for any team still benchmarking earned media on impressions or share of voice.

Traditional PR asks:

•Did the piece run in a tier-1 outlet?
•What is the unique visitor count of the outlet?
•Did the headline include the brand name?
•How many social shares did the piece get?

AI citation asks:

•Did the source make it into one of the 15 domains that capture 68% of citations?
•Is the brand mention inside a clean, extractable passage?
•Is the claim about the brand structured the way Wikipedia or Reddit threads structure claims?
•Does the source align with how ChatGPT, Claude, Perplexity, Gemini, and AI Overviews each weight their citation pool differently?

Most PR teams answer all four traditional questions and zero of the AI ones. That is the gap the Trade Press AI Index 2026 is calling out when it says "the PR firm that runs only earned media in 2026 is selling a service worth less than half what it was worth in 2022."

How concentration is changing, not stabilizing

Two trends suggest the 68% number will not hold steady at 68%.

First, citation pools are shrinking. Resoneo's 14-week study showed ChatGPT cutting cited domains per response from 19 to 15 after GPT-5.3 Instant became the default. GPT-5.5 is expected to compress further. Smaller pools concentrate harder.

Second, platform-specific divergence is widening. Conductor's 2026 AEO Benchmarks Report found average AI referral traffic at 1.08% of organic traffic, with citation overlap between Perplexity and Gemini below 20% on the same prompts. Engines are not converging on a shared source pool. They are diverging into separate ones. The top 15 across all engines hides the fact that the top 15 on any single engine looks different.

Both trends point the same direction. If you are not in a top-15 list today, the path back in gets harder as the pools shrink and diverge.

FAQ

How is 68% concentration "worse than PageRank ever was"?

Google's PageRank-era top 15 domains never captured 68% of search citations. The link graph was wider and the SERP layout forced more domain diversity. AI engines synthesize answers from a smaller, more retrieval-friendly pool, and the resulting concentration runs roughly two to three times higher than Google's peak concentration on commercial queries.

Why is Wikipedia so dominant in ChatGPT's citation pool?

Wikipedia appears in nearly every foundation-model training set, writes in the structured-passage format AI extraction prefers, and is updated continuously by a global editor base. ChatGPT reaches for Wikipedia because the model knows its shape, trusts its tone, and finds clean extractable passages inside it. Until a competing structured-knowledge surface emerges, Wikipedia stays at the top.

If we can't get into the top 15, is GEO worth doing at all?

Yes, but with different expectations. The 32% long tail is where most B2B SaaS lives, and the work to win citations inside it is real: schema, brand mentions, comparison content, third-party reviews, trade press. The goal for most companies is not to displace Wikipedia. It is to be the most-cited B2B SaaS in the 32% slice for your specific category.

Share of voice in SEO measures your visibility across keyword rankings. Citation share measures your presence inside the answer text AI engines produce. They overlap but are not the same. You can hold strong share of voice on Google and get zero AI citations if your brand mentions live in places AI engines do not retrieve from.

Which engine should B2B SaaS prioritize given the concentration data?

Start with ChatGPT. It has the largest user base, the most extreme single-domain concentration (Wikipedia at 47.9%), and the cleanest measurement surface through Bing Webmaster Tools and direct citation studies. Once ChatGPT citation share is measured and moving, expand to Perplexity and Google AI Mode where the source mix differs.

Bottom line

Fifteen domains capture 68% of AI citations. Wikipedia alone captures 47.9% of ChatGPT's top-10. The concentration is sharper than the open web ever produced, and it is getting sharper as citation pools shrink and platforms diverge.

The right response is not more blog posts. It is a deliberate plan to enter the 15 domains that already own the citation share, and to be the most-cited B2B brand inside the 32% long tail that holds the rest.

Most teams are still optimizing for the wrong leaderboard. The Trade Press AI Index just made it harder to ignore which leaderboard actually pays.

How to Earn a Wikipedia Page for AI Citations

Wikipedia powers 47.9% of ChatGPT's top citations. Most B2B SaaS brands have no page. Here is the notability playbook that actually works.

May 20, 2026Read→

02Industry Data

Has ChatGPT Search Demand Already Peaked?

ChatGPT search demand hit 14,800/mo Jan-Mar 2026 then dropped 33% in April. AI overview and AI search queries are absorbing the demand.

May 19, 2026Read→

03Research

Why Claude Cites Older Content Than ChatGPT

Only 36% of Claude's journalism citations come from the past 12 months, versus 56% for ChatGPT. That recency gap is the cleanest evergreen wedge B2B has.

May 6, 2026Read→

Framework

AI Citation Concentration Is Worse Than PageRank

How concentrated is the AI citation pool, really

The Wikipedia number is the part most teams miss

Reddit is the second gatekeeper

The long tail is where most B2B brands actually live

Why this concentration is more extreme than PageRank

Reason #1: LLMs were trained on a finite, biased web slice

Reason #2: Retrieval favors structured, citable passages

Reason #3: Brand mentions outweigh on-site content

Reason #4: Citation pools are shrinking, not expanding

Find out where you actually sit in the citation pool.

What concentration data means for B2B SaaS

The owned-vs-earned ratio has to flip

Wikipedia and Reddit are not optional

The citation pool is platform-specific

How to win citations in a concentrated source pool

Step 2: Fix the Wikipedia entry first

Step 3: Build a defensible Reddit presence

Step 4: Place third-party comparisons and reviews on G2, Capterra, and Trustpilot

Step 5: Pitch trade press, not consumer press

Build your top-15 citation presence in 90 days.

What traditional PR asks vs what AI citation asks

How concentration is changing, not stabilizing

FAQ

How is 68% concentration "worse than PageRank ever was"?

Why is Wikipedia so dominant in ChatGPT's citation pool?

If we can't get into the top 15, is GEO worth doing at all?

Which engine should B2B SaaS prioritize given the concentration data?

Bottom line

How to Earn a Wikipedia Page for AI Citations

Has ChatGPT Search Demand Already Peaked?

Why Claude Cites Older Content Than ChatGPT

Learn the CITE framework behind our GEO and AEO work

Explore our managed GEO services and AEO execution model

See what a managed GEO agency should actually do

Start with an AI visibility audit before execution

Ready to become the answer AI gives?

AI Citation Concentration Is Worse Than PageRank

How concentrated is the AI citation pool, really

The Wikipedia number is the part most teams miss

Reddit is the second gatekeeper

The long tail is where most B2B brands actually live

Why this concentration is more extreme than PageRank

Reason #1: LLMs were trained on a finite, biased web slice

Reason #2: Retrieval favors structured, citable passages

Reason #3: Brand mentions outweigh on-site content

Reason #4: Citation pools are shrinking, not expanding

Find out where you actually sit in the citation pool.

What concentration data means for B2B SaaS

The owned-vs-earned ratio has to flip

Wikipedia and Reddit are not optional

The citation pool is platform-specific

How to win citations in a concentrated source pool

Step 1: Audit your share inside the top 15 domains

Step 2: Fix the Wikipedia entry first

Step 3: Build a defensible Reddit presence

Step 4: Place third-party comparisons and reviews on G2, Capterra, and Trustpilot

Step 5: Pitch trade press, not consumer press

Build your top-15 citation presence in 90 days.

What traditional PR asks vs what AI citation asks

How concentration is changing, not stabilizing

FAQ

How is 68% concentration "worse than PageRank ever was"?

Why is Wikipedia so dominant in ChatGPT's citation pool?

If we can't get into the top 15, is GEO worth doing at all?

How is this different from share of voice in traditional SEO?

Which engine should B2B SaaS prioritize given the concentration data?

Bottom line

Continue the brief

How to Earn a Wikipedia Page for AI Citations

Has ChatGPT Search Demand Already Peaked?

Why Claude Cites Older Content Than ChatGPT

Learn the CITE framework behind our GEO and AEO work

Explore our managed GEO services and AEO execution model

See what a managed GEO agency should actually do

Start with an AI visibility audit before execution

Ready to become the answer AI gives?