In March 2026, Otterly.ai ran a controlled experiment that almost no one in the GEO industry has discussed at length. They built a fictional agency from scratch, gave it the smallest possible website, and then tested whether off-page citation signals alone could move it up the rankings inside ChatGPT, Perplexity, Google AI Overviews, and three other platforms.
The site had 7 pages. No schema. No backlinks. No domain authority score. No keyword history.
Fourteen days later, this brand sat at rank #7 across 10+ named competitors, captured top-3 visibility inside ChatGPT, hit position 2 inside Google AI Overviews, and produced 90 brand mentions across nine tracked prompts. The competitors they outperformed had real domain authority, deeper indexed page counts, and existing UK keyword footprints.
The experiment is the cleanest test of off-page citation strategy that has been published this year. It deserves a closer read than it has received.
What Otterly built and why it matters
The fictional agency was named "Agency X." The market context was "GEO agencies in London," chosen because it is location-specific, has named incumbent competitors, and shows up in real AI buyer queries. The setup was deliberately stripped to isolate one variable.
Otterly.ai 14-day GEO experiment, Agency X
A fictional brand with zero domain authority and 16 off-page placements
Source: Otterly.ai, "From Zero to Rank 7 in AI Search in 14 Days" (March 2026)
Starting conditions
16 off-page placements built
Sortlist, Clutch, Goodfirms, DesignRush
SEO and marketing news sites
GEO and AI search topics, 500–800 words
Existing agency discussion threads
14-day results, by platform
ChatGPT
Top 3
Google AI Overviews
Position 2
Perplexity
Top 5
Google AI Mode
Position 8
Microsoft Copilot
Poor
Gemini
Inconsistent
Overall outcome after 14 days
Total brand mentions
90
Share of voice
10%
Brand coverage of prompts
12%
Average position when cited
1.71
Final rank vs. 10+ named competitors
#7
Mentions from 2 location/year prompts
74%
74% of all mentions came from two prompts. Both included a city and a year ("top GEO agencies 2026 in London", "top GEO SaaS agencies of London 2026"). Broader generic queries returned near-zero visibility despite identical off-page work.
Propagation lag: Brand coverage stayed flat for the first 7 days, then accelerated through days 8–14. Off-page citations take roughly a week to register inside AI retrieval pipelines before visibility moves.
The 9 tracked prompts were a mix of broad and location-specific. Otterly monitored visibility across ChatGPT, Perplexity, Google AI Overviews, Google AI Mode, Microsoft Copilot, and Gemini, all through their own platform's citation tracking. The hypothesis was straightforward: off-page authority signals are more impactful than on-page signals for AI search visibility. The experiment was set up to give on-page nothing to work with, so any movement had to come from external mentions.
The 14-day window matters too. Most GEO discussions assume long timelines, the kind backlink-building requires inside traditional SEO. The Otterly window proved that AI search visibility responds far faster than backlink propagation usually does.
The 16 placements, broken down
Agency X built its citation footprint through four channels.
Directory listings on agency platforms. Sortlist, Clutch, Goodfirms, and DesignRush. Each listing was filled out with consistent service descriptions, location data, and category tags. Free placements on platforms that AI models index regularly. Four placements.
Listicle placements on SEO and marketing news sites. Outreach to publications that already run "top X agency" listicle content. The team offered tradable listicle assets in exchange for inclusion. Six placements built this way.
LinkedIn Pulse articles. Three articles on GEO and AI search topics, each running 500 to 800 words. LinkedIn ranks as the second most cited domain across AI search engines according to Otterly's broader citation analysis, so its long-form content is one of the highest-yield placements available.
Reddit thread participation. Three contributions to existing agency-discussion threads in relevant subreddits. Not promotional spam. Substantive answers to existing questions where the brand was named in context. Reddit citations carry weight because Reddit dominates as a community-source signal in ChatGPT.
Sixteen placements total. No paid links. No backlink schemes. No technical schema work. The site itself remained at 7 pages throughout.
The platform breakdown is the most useful finding
The aggregate "rank #7 of 10+ competitors" number is the headline. The platform-specific breakdown is where the strategic value sits.
ChatGPT and Google AI Overviews responded most strongly to the off-page signals. ChatGPT placed Agency X in the top 3 across the location-specific prompts. Google AI Overviews positioned the brand at #2. Both surfaces clearly weighted external citation density and credibility highly enough to lift a brand with no on-page authority.
Perplexity was more mixed. The brand reached top-5 on relevant prompts but did not consistently dominate. Perplexity's citation pipeline runs across multiple underlying models including GPT-5.4, Claude Opus 4.6, Gemini 3.1 Pro, and Kimi K2.5/K2.6, and each model treats sources differently. The variability is structural.
Google AI Mode performed weakest at position 8. This is significant. Google AI Mode and Google AI Overviews share Gemini infrastructure but run separate retrieval pipelines. AI Mode appears to weight on-page SEO and domain authority more heavily than off-page citations, which is exactly why a brand with zero domain authority struggled there.
Microsoft Copilot delivered poor results across the board. Gemini was inconsistent. These two platforms responded to different signals than the experiment was testing for. For a brand prioritizing Copilot visibility, Bing Webmaster Tools citation data and Copilot-specific source patterns suggest a different placement mix.
The implication for content teams is direct. ChatGPT and Google AI Overviews can be moved through external citation work alone. The rest of the surface set requires complementary strategy.
The Otterly experiment proved off-page citation works in 14 days. Most B2B brands have not tried it.
We build the placement mix that lifts your brand inside ChatGPT and Google AI Overviews specifically. Directory authority, listicle inclusion, LinkedIn long-form, Reddit presence, all sequenced and tracked for citation impact.
Book a Discovery CallWhy 74% of mentions came from two prompts
The most uncomfortable finding in the experiment is the prompt-level distribution. Of the 9 tracked prompts, two delivered 74% of all brand mentions. Both were location-tagged and year-tagged: "top GEO agencies 2026 in London" and "top GEO SaaS agencies of London 2026." The seven broader prompts collectively returned near-zero visibility despite identical off-page activity supporting the brand.
This pattern is consistent with what other GEO experiments have surfaced. AI search rewards prompt-precise content far more than keyword-broad presence. A brand that maps its content to the literal questions buyers are asking, with the qualifiers buyers actually use, captures most of the visibility. Generic content with broader phrasing returns very little.
For content teams, this is the case for tight prompt research before any content is written. Profound's Prompt Research Reports launched April 30 using a corpus of 1.5 billion real user prompts to identify what buyers are actually asking, rather than what teams guess they are asking. The principle the Otterly experiment validates is the same. Track real prompts, not assumed ones.
The Otterly team called the relevant content type "Exact Match Blogs," articles whose titles and structure match a specific high-intent prompt with both the qualifier and the year present. The format is narrow on purpose. It captures the exact query rather than the broad topic.
The 7-day propagation lag
Brand coverage stayed flat for the first seven days. Then it accelerated sharply across days 8 through 14. The growth pattern was not linear. Citations did not produce visibility immediately.
This matters for measurement. Teams that run a citation-building sprint and check AI visibility after 3 days will see no movement and conclude the work failed. The Otterly data suggests AI retrieval pipelines need roughly a week to reflect new citations in their visibility surfaces. The propagation lag is structural, related to crawl windows, model context refresh cycles, and citation graph rebuilds.
The implication is simple. Run citation campaigns in 14-day or longer windows. Measure at the end. Do not measure at day 3 or day 5.
Cross-validation with the Momentum case study
The Otterly experiment is not the only data point pointing in this direction. Momentum, a B2B SaaS GTM platform, ran a different version of the same playbook with on-page content as the lever. They optimized 100 articles aligned to specific prompts their buyers were running in AI tools. Result: 10 times the AI search visibility, doubled AI search sessions, and rankings above Salesforce and Zapier on dozens of queries.
Different lever, same direction. AI citation responds to specificity. Specificity beats scale. A focused intervention against the prompts buyers actually run produces outsized results compared to broad content programs against assumed search demand. The structural side of the same playbook is documented in Evertune's 33,000-URL ChatGPT citation analysis, which found the median cited page is 941 words with 15 external links, not a 3,000-word pillar page.
The pairing matters. Momentum did the on-page work. Otterly's Agency X did the off-page work. Both reached the top of their categories on the prompts they targeted. Both did it in under 30 days. The full playbook for a B2B brand sits at the intersection of the two: prompt-specific on-page content plus prompt-specific off-page citations.
Where this experiment does not generalize
The Otterly experiment has limits worth flagging.
First, the location qualifier. "GEO agencies in London" is a high-specificity local prompt. The dynamics for global B2B SaaS prompts are different. A query like "best CRM for B2B SaaS" has hundreds of competitors and decades of content history backing the incumbents. Sixteen placements will not move that needle in 14 days.
Second, the prompt selection. The two winning prompts were location-specific and year-specific. Most B2B buying queries are not. The "Exact Match Blog" content type works inside narrow query formats and degrades when prompts get broader.
Third, the platform mix. ChatGPT and Google AI Overviews responded. Microsoft Copilot, Gemini, and Google AI Mode did not, at least not at this placement volume. For brands prioritizing those surfaces, the placement count required is higher and the mix is different.
Fourth, the absence of a control variable for content quality. The 16 placements were created through outreach and platform participation, not random distribution. The team likely picked high-quality outlets where their listicle content fit. The signal includes both placement count and placement quality. Replication studies have not yet isolated the two.
These limits do not invalidate the experiment. They define where the playbook applies cleanly: prompt-precise, location-specific, ChatGPT-and-AI-Overviews-focused B2B contexts. Which is exactly where most challenger brands compete.
What the experiment validates structurally
Three things hold up across the data.
Citation density on credible third-party sites moves AI visibility faster than on-page changes alone. The fictional brand had no on-page authority signals at all and still reached top-3 in ChatGPT. The lift came from somewhere external to the site.
Prompt specificity dominates content distribution. The 74% concentration in two prompts shows that AI search runs on phrase-precise matching with intent qualifiers. Broad content does not earn equivalent visibility.
The lag between placement and visibility is real. Seven days of flat data preceded the acceleration. Teams measuring too early will misread a working campaign as a failed one.
These three findings compound. A brand that builds 16 high-quality citations against narrow, prompt-precise queries and waits long enough to measure correctly will see results that look impossible compared to traditional SEO timelines. The Otterly experiment is the proof.
Off-page citation work is now a 14-day intervention. Is it on your GEO roadmap?
We map your priority prompts, identify the placement gaps, and build the directory, listicle, LinkedIn, and Reddit citation footprint that lifts your brand inside ChatGPT and Google AI Overviews. Tracked, sequenced, and reported every two weeks.
Book a Discovery CallFAQ
How many citations does a brand need to rank in ChatGPT?
The Otterly experiment built 16 placements across directories, listicles, LinkedIn Pulse, and Reddit, then reached top-3 in ChatGPT for two location-specific prompts within 14 days. The number is a starting baseline for narrow B2B prompts. Broader queries with more competition require more placements. The mix matters more than the raw count.
Does on-page SEO still matter for AI search?
Yes, but it is not the only lever. The Otterly experiment kept on-page deliberately minimal and still reached top-3 inside ChatGPT through off-page work alone. Other surfaces, including Google AI Mode, weight on-page domain authority more heavily. A balanced approach combining prompt-precise on-page content with prompt-precise off-page citations is the strongest path. The Momentum on-page case study and the Otterly off-page experiment together describe both halves.
Why did Google AI Mode perform worse than Google AI Overviews?
Google AI Mode and Google AI Overviews use the same Gemini model family but operate on separate retrieval pipelines, as Otterly's earlier analysis confirmed. AI Mode weights on-page SEO and domain authority more heavily than off-page citations. AI Overviews responds more to external citation density. A site with zero domain authority is structurally penalized inside AI Mode regardless of off-page work.
How long does off-page citation work take to register inside AI search?
The Otterly data shows roughly seven days of flat coverage, then a sharp acceleration through days 8 to 14. Teams should plan for a minimum 14-day measurement window after a citation-building sprint. Earlier measurement risks reading the propagation lag as failure.
Should B2B SaaS brands prioritize directory listings, LinkedIn, or Reddit?
The Otterly mix used all three plus listicles. For B2B SaaS specifically, LinkedIn ranks second across AI citation surfaces and is one of the highest-impact channels for thought leadership. Reddit carries community-source weight inside ChatGPT. Directories provide consistent structured data that AI models index reliably. The combination outperforms any single channel.
What to do with this
The Otterly experiment is one of the cleanest off-page GEO data points published in 2026. It validates a workable playbook for challenger brands competing inside narrow B2B prompts. The path is prompt research first, then citation placement second, with at least 14 days before measurement.
The brands that act on this in the next two quarters get a window. The brands that wait for the playbook to become standard advice will be competing against everyone running it.
Continue the brief
GPT-5.5 Is Live. What 'Reliability-First' Actually Means for Your AI Citations.
GPT-5.5 ('Spud') launched April 23, 2026 with a 'reliability-first' design focused on reducing hallucinations. In practice, that means heavier reliance on training data and less live web retrieval. Here's what the third citation pool compression event looks like, and which brands survive it.
When the Query Is Fresh, AI Visibility Becomes a PR Problem
New citation research suggests AI answer engines use one source stack for evergreen questions and another for fresh ones. When the prompt is time-sensitive, journalism and recency-rich coverage take a much larger share. That changes who should own visibility during launches, announcements, and category-defining moments.
Two-Thirds of ChatGPT Answers About Your Brand Come From Training Data. Not the Web.
Semrush data from February 2026 confirms ChatGPT only enables real-time web search for 34.5% of queries, down from 46% in late 2024. The other 65.5% come from training data. Most GEO programs are optimizing for the smaller half.
Framework
Learn the CITE framework behind our GEO and AEO work
See how Comprehend, Influence, Track, and Evolve turn AI visibility into an operating system.
Services
Explore our managed GEO services and AEO execution model
Audit, prompt discovery, content execution, and ongoing monitoring tied to AI search outcomes.
GEO Agency
See what a managed GEO agency should actually do
Compare real GEO operating work against generic reporting or tool-only approaches.
Audit
Start with an AI visibility audit before execution
Understand prompt coverage, recommendation gaps, source mix, and where competitors are winning.