A pattern is showing up across the latest AI search research that should concern any brand running a GEO strategy: AI models are getting more thorough and more selective at the same time.
Position Digital's updated AI SEO statistics document what changed with GPT-5.4. The model now generates 10 or more sub-queries per user prompt, an increase from the roughly 8 fan-outs documented in earlier GPT-5 models. It is doing more research before forming an answer.
But here is the part that cuts against the intuitive read of "more research means more citations": GPT-5.4 cites 20% fewer unique domains per response compared to its predecessors. The model is casting a wider net during retrieval and then tightening considerably at the citation stage. Brands that had been appearing in responses on the strength of a single on-topic page are finding themselves pushed out by a model that increasingly pulls from fewer, higher-confidence sources per answer.
That is a structural shift, not an algorithm update to optimize around. Understanding why it happens is the starting point for responding to it.
GPT-5.4 citation behavior — April 2026
More research, fewer sources: the narrowing citation pool
Source: Position Digital, 100+ AI SEO Statistics (Updated April 2026)
10+
Sub-queries per prompt
Fan-outs per user query in GPT-5.4
–20%
Unique domains cited
Fewer domains per response vs. prior models
63%
Pages abandoned
ChatGPT agents leave pages with no extraction
3.6×
LLM crawl vs. Googlebot
AI crawlers now out-visit Google's own spider
Citation likelihood by content structure
What it is + who uses it + how to choose + pricing
10+ citations per URL
5–7 statistics in opening section
+20% citation likelihood
Single topic, moderate structure
Occasional citation
Generic, late-placed content or poor crawlability
0 citations
89% of Fortune 500 companies have not configured robots.txt with AI-specific directives. Most are getting crawled 3.6× more often by AI bots while optimizing only for Googlebot.
What changed in GPT-5.4's citation behavior
The widening fan-out and narrowing citation pool are two sides of the same underlying dynamic.
When GPT-5.4 fans out into 10+ sub-queries, it is being more thorough about checking whether multiple sources agree on the facts it wants to include. That cross-referencing behavior means the model needs fewer unique sources per response: if three sub-queries return the same five domains as the most credible answers, those five domains get cited and others do not.
The practical implication is a winner-takes-more dynamic in AI citations. Brands that are already well-established in the citation pool benefit because their content appears across more sub-queries, reinforcing the model's confidence. Brands in the citation tier just below that threshold face a narrower path in. The 20% domain reduction means roughly one in five previously cited domains fell off the list between GPT-5.4 and its predecessors.
Position Digital also documented a related behavior: GPT-5.4 increasingly uses site: operators to pull content directly from brand domains rather than relying on third-party mentions. When a model is confident enough in a brand's authority, it goes directly to the source. When it is not, it goes elsewhere. This makes the brand authority signals documented in The Digital Bloom's citation research more important, not less, under the new model.
AI crawlers have overtaken Googlebot in visit frequency
The same Position Digital research documents a finding that has significant technical implications: AI crawlers now visit sites 3.6 times more frequently than Googlebot. AI bots have overtaken Google's own spider in raw crawl volume.
This matters because most technical optimization still focuses on Googlebot. Site speed benchmarks are calibrated for Googlebot. Robots.txt configurations were written for Googlebot. CDN caching strategies account for Googlebot behavior. The infrastructure assumption baked into most sites is that Google is the primary crawler to serve well.
That assumption is now wrong by a significant margin.
AI crawlers include GPTBot (OpenAI), ClaudeBot (Anthropic), OAI-SearchBot (OpenAI's search-specific crawler), PerplexityBot, and Googlebot-used-for-AI-Overviews, among others. These bots have different behavior than Googlebot in ways that matter for citation outcomes: they are more sensitive to rendering issues, more likely to abandon pages that require JavaScript execution before serving content, and more likely to retry pages that were previously inaccessible.
ProGEO.ai's research from March 2026 found that 92.8% of Fortune 500 companies have a robots.txt file, but only 11% have added AI-specific directives. The configuration gap is not just a mid-market problem. The largest companies in the world have not updated their crawler management for AI bots that are now visiting their sites 3.6 times more often than Google.
For most B2B SaaS companies, checking robots.txt for GPTBot, ClaudeBot, OAI-SearchBot, and PerplexityBot rules takes about ten minutes. The absence of those names from a robots.txt file does not automatically mean they are allowed. Overly broad Disallow: / rules or wildcard bot blocks are the most common source of unintentional AI crawler exclusions, and they are the most common explanation for why well-written content never appears in AI responses.
Why 63% of pages get abandoned before any content is extracted
Even when AI crawlers reach a page, extraction is not guaranteed. Position Digital's data shows that ChatGPT agents leave pages without extracting any content 63% of the time. Less than 40% of ChatGPT agent visits result in meaningful extraction.
Three causes account for most abandonments.
JavaScript-rendered content. Pages that require JavaScript execution before displaying main content serve empty HTML to AI crawlers that do not run JavaScript. The crawler visits, sees an empty shell, and moves on. The content exists; the crawler just cannot see it. This is the most common rendering issue affecting AI crawlability, and it is particularly common on single-page applications and React-based sites that rely on client-side rendering.
Slow server response time for non-Googlebot user agents. Many CDN configurations are optimized for Googlebot and human browsers, with different cache policies or origin routing for unknown user agents. AI bots requesting a page may receive significantly slower responses than Googlebot would. Pages that load in under a second for Googlebot may take 3-5 seconds for a less-recognized AI user agent, putting them at the top of the abandonment distribution.
Content blocked by access controls or paywalls. Any content behind authentication, subscription gates, or IP-based access restrictions is invisible to AI crawlers by definition. This is an obvious case, but the less obvious version involves geographic restrictions, bot detection systems, or CDN security rules that aggressively block non-whitelisted user agents. If your security tooling treats AI crawlers as suspicious traffic, they will be blocked before they reach your content.
Not sure whether AI crawlers can actually read your site?
We run a full AI crawlability audit across GPTBot, ClaudeBot, OAI-SearchBot, and PerplexityBot, identify where content is inaccessible, and implement the fixes that open your site to the AI search traffic it should be receiving.
Get Your AI Visibility AuditWhat earns a citation in the GPT-5.4 era
The narrowing citation pool means the content characteristics that predict citations have become more important, not the same. Position Digital documented two specific patterns that separate high-citation content from the rest.
Lead with statistics
Early-discovery content with 5 to 7 statistics in the opening section earns a 20% higher citation likelihood. This is not a coincidence of sample size. It reflects how GPT-5.4's sub-query generation works. When a fan-out produces a sub-query about a specific claim, pages that include verifiable, attributed statistics early in the document answer that sub-query immediately. Pages that bury data in the middle of a long introduction do not.
The practical implication is that every major section of a page should open with a specific, sourced claim before moving into explanation. The passages beat pages principle has been established across multiple studies: Conductor's 2026 AEO/GEO Benchmarks Report found that 44.2% of AI citations come from content in the first 30% of a page. Combining that front-loading principle with specific statistics in those leading positions addresses both the extraction layer and the citation probability layer at once.
Cover multiple angles on one page
The top 4.8% of URLs by citation count share a specific structural characteristic: they combine the "what is it," "who uses it," "how to choose," and "pricing" angles on a single page. Each of these angles corresponds to a distinct type of sub-query a user might generate when asking GPT-5.4 about a category or product. A page that covers all four angles is a strong match for more sub-queries than a page that covers only one.
This is a direct counter-argument to the common advice to create narrow, highly focused individual pages for each topic. That approach was well-suited to classic SEO, where a tightly focused page had higher topical relevance for a specific keyword. In the AI search optimization context, a multi-angle page that addresses the full question space around a topic has a higher probability of appearing across multiple fan-out sub-queries, which increases its overall citation rate.
The brands already achieving this naturally are the ones that wrote full-coverage product and category pages before GEO existed as a discipline. Their content covers more ground. The brands that fragmented everything into thin, focused pages to chase keyword rankings may need to consolidate.
| Content structure | Citation impact | Why |
|---|---|---|
| 5-7 statistics in opening section | +20% citation likelihood | Matches data-seeking sub-queries immediately |
| Multi-angle page (what/who/how/pricing) | Top 4.8% citation tier | Covers more fan-out sub-queries per page |
| Content in first 30% of page | 44.2% of all citations | Extraction prefers early, accessible positions |
| Comparison content with 3 tables | +25.7% citations | Structured data is easier to extract and cite |
| Answers 40-60 words in length | Strong extraction signal | Matches AI response passage length |
What the Fortune 500 configuration failure means for your site
The ProGEO.ai research finding deserves more attention than it typically receives. If 89% of Fortune 500 companies have not configured their robots.txt for AI crawlers, the implication for mid-market B2B SaaS companies is that the configuration failure rate is almost certainly higher than 89%. The largest companies in the world, with dedicated technical SEO teams, have not solved this. Companies without that staffing level are unlikely to have either.
The specific gap documented by ProGEO.ai is AI-specific directives in robots.txt. This does not mean these companies have blocked AI crawlers. It means they have not explicitly permitted them either, and the robots.txt configurations that exist were written before GPT-5.4, ClaudeBot, and PerplexityBot were regular visitors.
The gap between having a robots.txt file (92.8%) and having AI directives in it (11%) is not a small oversight. It reflects that robots.txt management has not been updated to account for a crawl environment where AI bots have become the primary traffic source. Bing Webmaster Tools AI Performance data is the fastest way to check whether Bing's AI bot is reaching your pages. For other AI bots, the audit involves checking server logs for known AI user agent strings alongside robots.txt analysis.
What comes next: GPT-5.5 and the citation tightening trend
GPT-5.4 narrowed cited domains 20% versus prior models. The question is whether that trend continues with GPT-5.5, codenamed "Spud" internally at OpenAI, which is likely to arrive in late April or May 2026 based on OpenAI's typical pretraining-to-release schedule.
The directional answer from the available data is yes, the trend toward citation concentration is likely to continue. The pattern across GPT model generations has been increasing sophistication in source evaluation alongside decreasing willingness to distribute citations broadly. A more capable model is a more selective one, because it has better tools for identifying which sources are genuinely authoritative versus which ones happen to contain relevant text.
For brands not yet in the established citation tier for their category, the window for establishing that position is narrowing with each model release. Citation drift data from Scrunch and Stacker already shows 40-60% monthly churn in cited domains across major platforms. A model that cites 20% fewer domains on top of that churn means fewer opportunities to earn a sustained citation position through incremental content improvements alone.
The brands that will hold citation positions through GPT-5.5 and subsequent models are the ones that have addressed the technical access layer, built multi-angle pages with strong data density, and built the off-site brand presence that predicts which brands AI models default to when forming answers. Content quality is necessary but insufficient without the other two.
FAQ
Why does GPT-5.4 cite fewer domains if it runs more sub-queries?
The two behaviors work together rather than against each other. GPT-5.4 uses more sub-queries during retrieval, which means it consults more candidate pages before forming an answer. But the cross-referencing behavior that produces more sub-queries also identifies which sources appear consistently across queries. The model cites the sources that show up reliably across multiple sub-queries rather than sources that appear for only one. The result is more thorough research and a more concentrated set of cited sources. Position Digital's April 2026 analysis documented the net effect as a 20% reduction in unique domains cited per response compared to earlier GPT-5 models.
How do I check whether AI crawlers can access my site?
Start with your robots.txt file and look for rules that apply to GPTBot, ClaudeBot, OAI-SearchBot, and PerplexityBot by user agent name. Missing entries do not confirm they are allowed: check for wildcard Disallow rules that could inadvertently block unrecognized bots. Next, check server logs for visits from known AI user agent strings and note the HTTP response codes. Then test your pages with JavaScript disabled: if the main content disappears, AI crawlers that do not render JavaScript cannot see it. The FAQ schema and crawlability guide covers the technical audit in more detail.
What is the practical difference between fan-out queries and citation behavior?
Fan-out queries are the internal sub-queries that AI models generate from a single user prompt. A user asking "what is the best GEO tool for B2B SaaS" might produce 10 or more internal sub-queries about specific tools, pricing, comparison criteria, and user reviews. Citation behavior refers to which sources appear in the final response. More fan-outs mean the model evaluates more candidate pages before writing the response. Fewer cited domains means the model is synthesizing from a narrower pool of trusted sources. The user sees one coherent answer. The citation selection happens entirely before that.
Does being cited by GPT-5.4 predict citations from other AI models?
Not directly. Different platforms use different retrieval and citation systems. A page cited by ChatGPT may or may not appear in Perplexity or Google AI Overviews for similar queries. That said, the content characteristics that earn GPT-5.4 citations, such as data-rich opening sections, multi-angle coverage, and clean crawlability for AI user agents, tend to improve performance across all AI platforms. These are not ChatGPT-specific optimizations. They reflect how AI systems generally evaluate content for extraction. A complete GEO strategy tracks citation presence across multiple platforms rather than treating GPT-5.4 behavior as representative of the full AI search picture.
How should brands prepare before GPT-5.5 launches?
The two most effective actions are a technical crawlability audit and a content structure review. On the technical side, confirm that GPTBot, ClaudeBot, OAI-SearchBot, and PerplexityBot can reach your pages and that main content is available in static HTML. On the content side, identify your highest-value pages and restructure them to lead with statistics, cover multiple audience questions in one place, and include clearly formatted answer passages. Establishing a citation baseline before the GPT-5.5 launch provides the pre/post comparison data that shows exactly what changed when the new model rolls out.
The direction the data points
Each new GPT model release produces a narrower citation pool than the one before it. The 20% domain reduction with GPT-5.4 follows the same trajectory as prior model generations. A model that visits more pages and cites fewer of them is making sharper distinctions about authority and relevance, not random ones.
The technical layer is the hidden bottleneck. AI crawlers visiting sites 3.6 times more than Googlebot while most brands have optimized for Googlebot means the infrastructure assumption is wrong. Pages that cannot be crawled, rendered, or extracted produce zero citations regardless of content quality.
Getting the technical access right, building content that covers multiple question angles with early data density, and establishing the off-site brand presence that AI models use as a credibility signal are what determine citation position as models get more selective. Content quality without these foundations addresses the 37% of pages that get past the abandonment filter while leaving the access problem intact.
GPT-5.5 is coming. Your citation baseline should be set before it arrives.
We audit your full AI crawlability, content structure, and citation presence across ChatGPT, Perplexity, and Google AI Overviews, then build the program that holds your position through model updates. Most clients see measurable citation movement within 60 days.
Book a Discovery CallFramework
Learn the CITE framework behind our GEO and AEO work
See how Comprehend, Influence, Track, and Evolve turn AI visibility into an operating system.
Services
Explore our managed GEO services and AEO execution model
Audit, prompt discovery, content execution, and ongoing monitoring tied to AI search outcomes.
GEO Agency
See what a managed GEO agency should actually do
Compare real GEO operating work against generic reporting or tool-only approaches.
Audit
Start with an AI visibility audit before execution
Understand prompt coverage, recommendation gaps, source mix, and where competitors are winning.