This week in the Musk v Altman trial in Oakland, Microsoft testimony surfaced one of the most quietly material disclosures of the year. Microsoft's own executives feared the company was becoming too dependent on OpenAI. CEO Satya Nadella drew an explicit historical parallel in a July 2022 internal email, writing that he did not want Microsoft to become the next IBM while OpenAI became the next Microsoft.
By June 2026, Microsoft will have spent over $100 billion on OpenAI through investments, infrastructure, and hosting. The trial disclosed that even as that money flowed, internal alarm bells were ringing about single-vendor concentration.
If Microsoft modelled the concentration risk, your AI visibility plan probably has the same risk and a fraction of the budget to fix it. Most B2B AI visibility programs we audit are 70 to 90 percent ChatGPT-weighted. That is a single-vendor bet on a category that is splitting open week by week.
What Microsoft's testimony actually said
Per CNBC's trial coverage, Nadella testified that "it was becoming even more core and important that we had real agency at every layer of the stack." GeekWire's account of the July 2022 internal email is sharper. Nadella told colleagues he did not want Microsoft to repeat the IBM-Microsoft dynamic of the 1980s. IBM owned the PC. Microsoft owned the operating system. IBM won the headline, Microsoft won the decade.
The disclosure matters because it changes the read on every Microsoft AI move of the past 18 months. Microsoft's $10 billion Anthropic investment in January 2026. Claude in Microsoft 365 going generally available on May 5. The Azure-OpenAI-but-also-Azure-Anthropic story. Each of those reads as Microsoft buying optionality at the model layer. The trial testimony made the motivation explicit.
Microsoft modelled the concentration risk on a vendor it co-built. Your AI visibility plan is concentrated on a vendor you do not control at all.
The diagnostic: why most B2B AI visibility is concentrated on ChatGPT
Five reasons explain why a typical B2B GEO program ends up 70 to 90 percent ChatGPT-weighted. Each one is fixable, but only if you see it explicitly.
Reason #1: ChatGPT got the press coverage, so it got the budget
ChatGPT's $200 billion valuation and its weekly user counts mean every CMO in the world has heard of it. Claude, Perplexity, and Gemini have not had the same narrative density at the consumer layer. So GEO platforms led with ChatGPT prompt tracking. The dashboards your team checks each week were ChatGPT-shaped from day one. That choice compounds into a measurement gap, which compounds into a content-strategy gap.
Reason #2: ChatGPT generates 91 percent unique queries each run
Per our earlier analysis of why ChatGPT and Perplexity need different content strategies, ChatGPT generates roughly 91 percent unique queries each time the same buyer prompt runs. Perplexity sits closer to 14 percent. The implication is that ChatGPT prompt-tracking volumes look enormous, which makes them feel complete, which makes teams over-invest in the channel where their data looks richest. Volume is not coverage. It is volume.
Reason #3: ChatGPT cites about 5 sources per answer. Claude cites 13
We covered this in detail in the ChatGPT, Claude, Gemini citation density gap. ChatGPT runs a tight citation pool. Claude runs a wide one. Gemini sits in between. If your only measurement is ChatGPT citation share, you are looking at a 5-source race. You are not seeing the 13-source race that Claude is running in parallel, where mid-authority domains and topic specialists are pulled in.
Generative Pulse — Muck Rack, May 7, 2026
25 million links analyzed across ChatGPT, Claude, Gemini
17 industries. Third edition of the study. Earned-media share has held between 82% and 89% across all three editions since July 2025.
Citation rate (% of responses that include any citation)
Claude only cites in 55% of responses. ChatGPT cites in 96%. The two systems disagree on when citations are warranted.
Average sources per cited response
When Claude does cite, it pulls 2.6x as many sources as ChatGPT. The retrieval philosophies are not the same problem.
Top-cited domain by platform
ChatGPT
Wikipedia
Gemini
Claude
PubMed Central
Three platforms, three different ideas of what counts as the most trustworthy source on the web.
Earned media vs paid content (% of all AI citations)
Paid and advertorial content captures 0.3% of AI citations. The 280x gap to earned media has held across three editions since July 2025.
Press releases appear 3.5x more often in industry-trend queries than in best-of queries. Half of all journalism citations come from articles published in the last 12 months. Recency and trend-context drive the citation pool.
The chart shows where the citation density actually lives. It is not where most B2B brands are spending their measurement budget.
Reason #4: AI engines cite the same brands but different sources
A separate finding documented across multiple 2026 studies. AI engines cite the same brands but very different sources. Pairwise top-100 brand overlap across engines runs 36 to 55 percent. Pairwise top-100 source overlap runs 16 to 59 percent. Authority sources and user-generated content split even more aggressively.
That means a brand can be cited inside ChatGPT primarily through Wikipedia and a Reuters piece, and cited inside Claude primarily through PubMed Central and a Stack Overflow thread. If your GEO program only audits ChatGPT, you do not know which earned-media surfaces are actually carrying you on the other engines.
Reason #5: The model layer is consolidating to multiple winners, not one
Anthropic now sits at a roughly $900 billion pre-IPO valuation per industry reporting. Perplexity hit a roughly $20 billion valuation. Google reorganized search around Gemini. xAI's Grok is embedded in X. The two-horse race assumption baked into a lot of GEO programs through 2024 has not survived 2026. Per our AI search market share 2026 analysis, the field is now multi-polar and the share is moving quarter by quarter.
A GEO plan that tracks one model is a GEO plan that fails the day the market shifts. Microsoft saw the same risk inside its own deal and bought hedges. You can do the same.
If your GEO measurement is 70 to 90 percent ChatGPT, you are running concentration risk that Microsoft modelled before you did.
We run multi-LLM AI visibility programs for B2B SaaS portfolios. ChatGPT, Claude, Perplexity, Gemini, and Copilot tracked together, with per-engine content workstreams. Six to eight week ramp.
Book a Discovery CallThe prescription: five steps to de-risk in 90 days
The fix is operational, not strategic. The strategy is already obvious. The execution sequence is what most teams get wrong.
Step 1: Audit citation share across all five major AI surfaces
Pull citation share for your brand and your top three competitors across ChatGPT, Claude, Perplexity, Gemini, and Copilot. If your platform only covers three of the five, run manual prompts on the other two for your top 20 buyer queries. Most B2B portfolios we audit show a 25 to 40 percent share on ChatGPT and a 4 to 12 percent share on at least one of the other four. That gap is the starting line. We described the broader audit pattern in how to run an AI visibility audit.
Step 2: Identify which source domains carry your brand on each engine
For each engine, pull the top 10 source domains cited alongside your brand when buyers ask category questions. The list will not match across engines. ChatGPT will lean Wikipedia, Reuters, and major trade publications. Claude will pull mid-authority specialists, PubMed Central in regulated categories, and Stack Overflow in technical ones. Perplexity will lean Reddit and YouTube. Gemini will lean Reddit and recent news. Build a 5-column matrix. The empty cells are your content-and-earned-media gaps.
Step 3: Add Claude in Microsoft 365 to your tracked surface list
Claude in Microsoft 365 went generally available on May 5. Inside Word, Outlook, Excel, and PowerPoint, your prospect can now ask category questions, draft RFPs, and run competitive analyses with a Claude that is part of their procurement-approved stack. We covered the buyer-side mechanics in Claude for Excel is live: will it cite you?. Almost no GEO platform tracked this surface in the first ten days of GA. Internal-app surfaces are now part of the citation map. Treat them as one.
Step 4: Build per-engine content workstreams, not one universal pipeline
The shared mistake is to publish one long-form piece and assume it covers every engine. It does not. ChatGPT extracts crisp Wikipedia-style passages with a single direct answer. Claude pulls long mid-authority pages with strong internal structure. Perplexity rewards Reddit and YouTube depth. Gemini favors recent news and Reddit threads.
Practical sequence. Pick your three highest-value buyer queries. For each, plan three formats. One Wikipedia-shaped reference page. One mid-authority deep dive. One earned-media surface (a Reddit AMA, a YouTube product walkthrough, a guest piece on a trade publication). Ship within 60 days. Measure citation lift on each engine 14 days post-publish, then again at 45 days.
Step 5: Set monthly citation-share targets per engine, not a single share number
Most B2B brands report one citation-share number. That number hides the concentration. Replace it with five numbers, one per engine, tracked monthly. Set a per-engine target. A reasonable starting target for a mid-funded B2B SaaS brand is 25 percent ChatGPT, 18 percent Claude, 15 percent Perplexity, 12 percent Gemini, 10 percent Copilot. The numbers will move. The discipline of reporting five is what changes behavior.
How much divergence are we actually talking about
The cross-engine numbers are not hypothetical. Three 2026 datasets anchor them.
| Metric | ChatGPT | Claude | Gemini | Source |
|---|---|---|---|---|
| Sources cited per answer | ~5 | ~13 | ~8 | Otterly URL Citation Study, May 2026 |
| Top-100 source overlap with peers | 16-59% pairwise | 16-59% pairwise | 16-59% pairwise | Cross-engine 2026 studies |
| Unique queries per run | 91% | ~70% | ~60% | Profound 1.5B prompt dataset |
| Top cited domain | Wikipedia | PubMed Central | Multi-engine 2026 audits |
Read the table once. Then read it again as a buyer journey. Your prospect runs a category query inside ChatGPT, sees Wikipedia plus a trade publication, and forms an initial shortlist. The same prospect runs the same query inside Claude two days later through Microsoft 365 at work, sees a different specialist source pool, and refines the shortlist. The brands that survive both screens are the brands that appear inside both pools. The brands that only optimized for ChatGPT do not.
If you only audit one engine, you cannot tell whether a citation gap is a content gap, an earned-media gap, or a measurement gap.
What this changes for the 2026 GEO budget
Three changes to plan for in the next budget cycle.
Multi-engine measurement is now a baseline, not a premium feature
If your current platform covers ChatGPT only, your renewal conversation should start with multi-engine coverage as a non-negotiable. Most of the tier-1 GEO platforms now cover at least four engines. Some cover seven or eight. The differences are no longer in number of engines tracked. They are in retrieval-pool depth, real-user prompt corpora, and integration with content operations.
Earned-media spend should be engine-routed
If Claude pulls disproportionately from mid-authority specialists, your earned-media outreach should weight specialist publications inside your category. If Perplexity pulls Reddit and YouTube, your community and video strategy is no longer optional. The old PR playbook of "land Forbes and Reuters" still helps ChatGPT. It does almost nothing for Perplexity. Each engine has a different earned-media center of gravity.
Internal-app surfaces are part of the budget
Claude in Microsoft 365, ChatGPT Workspace Agents, Copilot in Outlook and Word, Gemini in Gmail and Drive. Each of these is a citation surface your buyer touches more often at work than they touch the public web. None of them sit inside a typical 2025 GEO contract. Most should sit inside a 2026 contract.
FAQ
Does this mean I should stop optimizing for ChatGPT?
No. ChatGPT is still the largest single discovery surface for most B2B categories. The argument is for additive coverage, not replacement. Keep the ChatGPT workstream. Add Claude, Perplexity, Gemini, and Copilot workstreams alongside it. The right ratio depends on your buyer mix. Run the Step 1 audit before you change the mix.
How fast can I tell whether de-risking is working?
Plan for a 14-day signal and a 45-day confirmation. Citation share moves on a roughly two-week cadence on most engines, with Gemini moving slightly faster and Claude slightly slower because of its broader retrieval pool. Set checkpoints at 14, 45, and 90 days. The 90-day reading is the one that matters for budget defense.
Is single-vendor concentration on Claude any safer than on ChatGPT?
No. Any single-engine concentration carries the same structural risk. Claude's retrieval pool will shift with each model update. Gemini's already did when Gemini 3 rolled out. ChatGPT's citation pool shrank by 21 percent in one update window earlier this year. Single-engine concentration is the risk, not the choice of engine.
What about Copilot? My buyers all live in Microsoft 365
Copilot citations route through Bing's retrieval layer, with Claude now also available inside the same Microsoft 365 surface. If your buyers live in Microsoft 365, you actually have two citation pools to track inside that one surface. Treat them as separate engines. The retrieval pools are not identical even when the host app is.
How does this interact with the Wave 3 GEO tooling shift?
The Wave 3 autonomous-publishing tools we covered in where GEO tools are headed make multi-engine coverage more important, not less. If a background agent is going to draft counter-responses on schedule, you need clean per-engine signals to trigger the right agent at the right time. Concentration risk and tooling generation are the same conversation viewed from two angles.
The bottom line
Microsoft modelled single-vendor concentration risk before most B2B marketing teams modelled their first GEO program. The trial testimony made the calculus public this week. The fix is not complicated. Audit five engines. Map source domains per engine. Track per-engine share. Route earned-media spend by engine. Add internal-app surfaces to the contract.
Most teams will read this, agree, and then default back to a ChatGPT-only weekly review by next Tuesday. The teams that actually rebuild the measurement and the content workstreams in the next 90 days will compound through the rest of 2026 while their competitors keep optimizing for one engine.
Five-engine AI visibility coverage with per-engine content workstreams. Built and operated for B2B SaaS portfolios.
We run AI visibility programs across ChatGPT, Claude, Perplexity, Gemini, and Copilot, with internal-app surfaces tracked alongside the public web. Six to eight week ramp, weekly per-engine reporting, agency-of-record retainer.
Book a Discovery CallContinue the brief
Claude for Excel Is Live. Will It Cite You?
Anthropic shipped Claude inside Excel, Word, and PowerPoint. Customer-internal documents are now a citation surface most B2B SaaS teams ignore.
Why Claude Cites Older Content Than ChatGPT
Only 36% of Claude's journalism citations come from the past 12 months, versus 56% for ChatGPT. That recency gap is the cleanest evergreen wedge B2B has.
Anthropic's $1.5B Services JV Is a Claude GEO Event
Anthropic, Blackstone, Goldman Sachs and Hellman & Friedman just spun up a $1.5B services firm aimed at PE-owned mid-market. Here is the GEO read.
Framework
Learn the CITE framework behind our GEO and AEO work
See how Comprehend, Influence, Track, and Evolve turn AI visibility into an operating system.
Services
Explore our managed GEO services and AEO execution model
Audit, prompt discovery, content execution, and ongoing monitoring tied to AI search outcomes.
GEO Agency
See what a managed GEO agency should actually do
Compare real GEO operating work against generic reporting or tool-only approaches.
Audit
Start with an AI visibility audit before execution
Understand prompt coverage, recommendation gaps, source mix, and where competitors are winning.
