AEO 101Single source of truth on AEO
Research10 min read

Does AEO Actually 5x Your AI Traffic?

Subia Peerzada

Subia Peerzada

Founder, Cite Solutions · June 5, 2026

Every GEO vendor has a number. "5x more AI traffic." "300% more ChatGPT referrals." It goes on a slide, and the slide closes deals.

A new study says most of that number is not you. It is ChatGPT getting bigger.

A log-based natural experiment published in June 2026 (arXiv 2606.04362) pulled apart a "5.7x more AI traffic" result and found the real lift from optimization was about 1.8x. The rest was platform growth that the site would have captured by doing nothing.

The headline multiple is mostly tide, not rowing.

This is not an argument against AEO. It is an argument for measuring it honestly. Below is what the study did, why the big numbers overstate the effect, and how to find your own real number.

What the natural experiment actually measured

The study tracked one high-traffic domain (glasp.co, hundreds of thousands of YouTube Q&A pages) over 90 days using first-party analytics and server logs. The optimization work was concentrated in January 2026. Treated pages were compared against the untreated remainder of the same domain, which nets platform-wide growth out of the result.

That on-domain control is the whole point. If both groups sit on the same site during the same months, anything they share (ChatGPT's overall growth, seasonality, model updates) cancels out. What is left is the optimization.

Here is the decomposition.

Log-based natural experiment, arXiv 2606.04362, June 2026

One domain. Hundreds of thousands of pages. 90 days of server logs.

Treated pages compared against the untreated remainder of the same domain, so platform-wide growth nets out of the measurement.

Decomposing the "5.7x more AI traffic" claim

Raw ChatGPT referral growth (all pages)5.7x

The headline number a vendor quotes

Untreated control pages (same domain)3.5x

Pure platform tailwind, no optimization

Isolated AEO treatment effect1.82x

The actual causal lift (95% CI 1.31 to 2.54)

A conservative placebo test returned p=0.16: suggestive, not conclusive. Most of the 5.7x was the platform growing, not the optimization working.

The headline multiple is mostly tide, not rowing. When a vendor shows you "5x more AI traffic," ask how much of that is just ChatGPT getting bigger. Without an untreated control, you cannot tell the two apart.

The raw growth across all pages was 5.7x. The untreated control pages grew 3.5x with no optimization at all. The isolated treatment effect was 1.82x, with a 95% confidence interval of 1.31 to 2.54. A conservative placebo test returned p=0.16, which the authors call suggestive, not conclusive.

Raw growth measures the tide. Treated-minus-control measures the rowing.

Their stated conclusion: "headline AEO multiples substantially overstate causal effect." Most of the 5.7x was the platform, not the work.

Why the 5.7x number is mostly platform growth

The gap between 5.7x and 1.8x is not a rounding error. It is the difference between a number with a control and a number without one. Five reasons the headline version is almost always inflated.

Reason #1: Most of the lift happened with zero optimization

The control pages grew 3.5x while nobody touched them. That is the baseline ChatGPT referral growth any page on the web caught in early 2026 just by existing. If you quote 5.7x as your result, you are taking credit for the 3.5x the internet handed you for free.

Reason #2: The control sat on the same domain, so the tide cancels out

The cleanest way to separate your work from the platform is an untreated group that shares everything except the treatment. Same domain, same period, same crawl budget, same model updates. When the control is on a different site, or there is no control at all, the platform growth stays baked into your number and inflates it.

Reason #3: The isolated effect was 1.82x, and even that is only suggestive

The honest read of this paper is not "AEO delivers 1.82x." It is "the best estimate is 1.82x, the confidence interval is wide, and the placebo test (p=0.16) does not clear the usual bar for significance." One domain, one experiment, January interventions only. The direction is real. The precise multiple is not settled.

Reason #4: Vendors quote the raw number because it sells

A 1.8x lift with a caveat does not fit on a pitch slide. A 5.7x lift does. The incentive in the GEO market runs toward the biggest defensible number, and "biggest defensible" almost always means "no control subtracted." The math is not wrong. The attribution is.

Reason #5: "Nx more AI traffic" is unfalsifiable without a control

If a vendor cannot tell you what the untreated version of your site would have done, the multiple they are quoting cannot be checked. It could be 90% platform growth or 10% platform growth, and the slide looks identical either way. A number you cannot falsify is a marketing claim, not a measurement.

A multiple without a control is not a result. It is a coincidence with good timing.

Want to know your real AEO lift, not your platform-inflated one?

We run treated-versus-control measurement on B2B SaaS sites, so you can see how much of your AI traffic growth is your work and how much is ChatGPT getting bigger. Most teams have never separated the two.

Book a measurement audit

What vendors measure versus what you should measure

The fix is not complicated. It is a denominator change. Most reporting compares your AI traffic today against your AI traffic before. The honest version compares your treated pages against your untreated pages over the same window.

What most AEO reports measure:

  • AI referrals this month versus last month
  • Total ChatGPT clicks before and after the project
  • A single raw multiple with no comparison group
  • Growth that silently includes platform tailwind

What you should measure instead:

  • Treated pages versus an untreated control on the same domain
  • The delta between the two groups, not absolute growth
  • A confidence interval, not a single point number
  • A placebo check before you trust the result

Each side is a self-contained way of reading the same traffic. The left side always produces a bigger number. The right side produces a number you can defend.

How to measure your real causal lift

You do not need an academic setup to do this. You need a control group and the discipline to report the delta. Four steps.

Step 1: Pick an untreated control set on your own domain

Split your candidate pages into two matched groups before you change anything. Optimize one group. Leave the other untouched for the measurement window. Match them on page type, starting traffic, and topic so the only systematic difference is the treatment. This is the on-domain control that made the arXiv study credible.

Step 2: Baseline both groups before you optimize

Record AI referrals and citation counts for both groups for at least four weeks before the work starts. Without a clean pre-period, you cannot run a placebo check later, and you cannot tell whether the groups were already drifting apart. Our GEO measurement guide covers the GA4 and log setup for this.

Step 3: Report the treated-minus-control delta, not raw growth

After the window, calculate growth for each group, then subtract. If treated pages grew 4x and control pages grew 3x, your causal lift is the gap, not the 4x. This single subtraction is what separates a measurement from a marketing claim. Pair it with the citation absorption method so you are tracking citation share, not just clicks.

Step 4: Run a placebo check before you trust the number

Pretend the treatment happened a month earlier than it did, then re-run the comparison on that fake date. If you see a "lift" on a date when nothing changed, your method is picking up noise, and your real result is suspect. The arXiv authors ran exactly this test and reported p=0.16, which is why they called their finding suggestive rather than proven.

Stop reporting raw multiples. Start reporting causal lift.

We help B2B SaaS teams set up the control group, baseline both cohorts, and report the treated-minus-control delta so your AEO numbers survive scrutiny from a skeptical CFO. Honest measurement is the whole service.

Talk to us about AEO measurement

What this study does not say

The data has limits, and overselling the 1.8x figure would repeat the exact mistake the paper warns against.

It does not say AEO delivers only 1.8x for everyone. It is one domain, one experiment, with interventions concentrated in a single month. The confidence interval runs from 1.31 to 2.54, and the placebo test did not reach significance. Treat 1.8x as a directional anchor, not a universal law.

It does not say AEO is not worth doing. A 1.8x causal lift is a strong return for a content intervention, and it compounds as the cited page keeps earning references. Princeton's GEO research found that structural moves like adding statistics and citing sources can lift visibility by up to 40%, so the levers are real. The point is that your real number is smaller than the raw number, not that the real number is bad.

It does not say platform growth will last forever. The 3.5x tailwind reflects a specific moment when ChatGPT referrals were growing fast across the whole web. AI referral volume is still small in absolute terms, roughly 1% of organic-equivalent traffic for B2B SaaS sites per Conductor's 2026 benchmark, which is part of why the multiple looks large off a small base. As that growth flattens, a larger share of your result will come from your own work, which makes a clean control group more valuable, not less.

FAQ

Does AEO actually increase traffic?

Yes, but by less than the headline numbers suggest. A June 2026 natural experiment isolated the causal lift from optimization at about 1.8x, after subtracting the 3.5x that untreated pages gained from platform growth alone. AEO works. The honest multiple is just smaller than "5x" once you remove the tailwind.

Why do GEO vendors quote 5x or higher?

Because raw growth includes platform tailwind they did not create. When ChatGPT referrals are growing across the entire web, any page can show a large multiple without any optimization. Vendors quote the raw number because it is bigger and fits a pitch slide. The defensible number subtracts an untreated control group, and it is usually much lower.

What is an on-domain control group?

A set of your own pages, on the same site, that you deliberately leave unoptimized during the measurement window. Because they share the same domain, period, and crawl conditions as your treated pages, platform-wide growth affects both equally and cancels out when you compare them. The difference that remains is the real effect of your work.

How long should an AEO measurement window be?

Plan for at least four weeks of pre-period baseline and eight to twelve weeks of measurement. AI citation patterns drift heavily month to month, so a short window will mistake normal citation drift for a treatment effect. A longer window with a control group is the only way to see signal through that noise.

Is a 1.8x lift good?

For a content and structure intervention, yes. A 1.8x causal lift means your optimized pages earned nearly double the AI referrals of comparable pages you left alone, on top of whatever the platform was already giving you. The issue was never that AEO underperforms. It is that raw multiples hide how much of the growth you actually caused.

The bottom line

The next time a vendor shows you "5x more AI traffic," ask one question: where is the control group? If there is not one, the number includes platform growth they did not produce, and the real lift is some smaller fraction nobody measured.

Run the split yourself. Pick an untreated set on your own domain, baseline both groups, subtract, and check the result against a placebo date. You will end up with a number smaller than the headline and far more useful, because it is the one you can defend when someone asks how you know. For the broader case that AI referrals convert harder than they appear, see our breakdown of why AI search traffic converts better than SEO.

Ready to become the answer AI gives?

Book a 30-minute discovery call. We'll show you what AI says about your brand today. No pitch. Just data.