You have decided to outsource GEO. Smart, if you do not have someone in-house who owns it. The problem starts on the first sales call. Every agency on your shortlist says they do GEO, every deck has the same ChatGPT screenshot, and you have no way to tell the operators from the resellers who renamed their SEO retainer last quarter.
That gap is now measured. The 2026 State of Generative Engine Optimization in B2B Marketing, a June 2026 study of 225 B2B marketing and revenue leaders by GNW Consulting and Demand Metric, found that 88% of SEO agencies claim to offer GEO services while 37% of those offerings are loosely defined. So roughly four in ten agencies pitching you GEO cannot say what they actually deliver.
The short version of how to vet a GEO agency: make them prove a baseline on your own brand before you sign, demand the named buyer prompts they will track you against, and put the deliverable in the contract as something you can inspect every week. The rest of this guide is the long version, with the red flags and the exact questions.
GEO agency vetting scorecard
| Dimension | Operator | Reseller red flag |
|---|---|---|
| Baseline proof | Runs a live citation-share read on your brand before the contract starts. | Begins work with no baseline, so no result can be proven later. |
| Prompt set | Hands you the exact buyer prompts you will be scored against. | Promises to optimize for AI search in general, with no named prompts. |
| Engine coverage | Tracks ChatGPT, Claude, Perplexity, Google AI Overviews, and Copilot. | Shows one ChatGPT screenshot and calls it visibility. |
| Deliverable | Produces a weekly decision: which page gets rebuilt next. | Sends a monthly PDF that nobody acts on. |
| Attribution | Separates measured citation change from self-reported survey lift. | Claims a 5x traffic jump with no method attached. |
| Exit terms | Sets a 90-day checkpoint with a named metric and a clean exit clause. | Locks you into 12 months with no definition of success. |
Why vetting a GEO agency got harder in 2026
GEO is roughly 18 months old as a paid service category. Nobody has a 10-year track record, so the usual proxy for trust, a wall of client logos, tells you almost nothing. A logo means the agency signed a contract, not that citation share moved. You have to vet the method instead of the history.
The category is 18 months old. Nobody has a long track record, so you vet the method, not the logos.
The supply side is also crowded with repackaging. When 88% of agencies claim GEO, most of that number is keyword-SEO work with three new words on the invoice. The market is also fragmenting across engines: First Page Sage's June 2026 estimate put ChatGPT at 53.1% of AI chatbot usage, down from around 62%, with Claude, Gemini, and Copilot taking the rest. An agency still selling "we get you into ChatGPT" is optimizing for a shrinking slice. Here are the red flags that separate a real operator from a reseller.
Red flag #1: They cannot show a baseline read on your brand
If an agency wants to start work before measuring where you stand today, walk. A baseline is a citation-share read across the major AI engines for your top buyer prompts, taken before any work begins. Without it, every result they claim in month three is unprovable, because there is no before to compare the after to.
A baseline you never took is a result you can never prove.
Red flag #2: The scope is "optimize your content for AI search"
That sentence is not a scope. It names no prompts, no engines, no deliverable, and no number. It is the most common phrasing among the 37% of loosely defined offerings, because it sounds like work while committing to nothing. A real scope names the prompts you will be measured on and the artifact you will receive.
Red flag #3: Every example is a single ChatGPT screenshot
A screenshot of your brand appearing in one ChatGPT answer is a moment, not a trend. It does not show frequency, position, sentiment, or whether the same prompt cites you next week. Citation patterns drift week to week, so one screenshot tells you nothing about whether you are reliably in the answer set.
Red flag #4: They only measure ChatGPT
ChatGPT is the largest AI surface, but it is now barely half of usage and falling. An agency that only checks ChatGPT is blind to Claude, Perplexity, Google AI Overviews, and Copilot, where your buyers also research. Single-engine optimization was a defensible shortcut in early 2025. In mid-2026 it is a coverage gap. We laid out the split in the 2026 AI search market share breakdown.
Red flag #5: The only deliverable is a monthly report
A monthly PDF is a record of activity, not a function. The question is what decision the work produces. If the engagement ends each month with a report and no ranked list of what to rebuild next, you have bought monitoring you could have licensed from a tool for a fraction of the retainer.
If the deliverable is a PDF, you bought a report. If it is a decision, you bought a function.
Red flag #6: They promise a traffic multiple with no method
"We will 5x your AI traffic" is a sales line, not a forecast. AI referral traffic is small and compounding, and most reported lift is self-reported survey data rather than measured causal change. An agency that quotes a clean multiple without explaining how it isolates GEO from everything else moving your traffic is selling confidence, not measurement. We unpacked the honest version in does AEO actually 5x your traffic.
Before you book the second call, run the pitch through a simple contrast. The two sides sound almost identical until you ask what gets delivered and how it gets measured.
A reseller pitches:
- •"We optimize your site for AI search."
- •"We will get you into ChatGPT."
- •"You will get a monthly visibility report."
- •"Trust us, we have done this for big brands."
An operator pitches:
- •"Here is a baseline read on your brand across five engines, taken this week."
- •"Here are the 30 buyer prompts we will track you against."
- •"Each week you get a ranked list of which page to rebuild next."
- •"Here is how we separate measured citation change from noise."
Step 1: Make them read your brand's citation share before you sign
Ask any shortlisted agency to run a baseline read on your brand during the sales process, not after the contract. A real operator can pull your current citation share for ten buyer prompts across the major engines in a day. The quality of that read, whether it shows specific prompts, the responses, and which sources won, tells you more than any case study.
If they cannot or will not produce a baseline before you pay, that is your answer. The vendors who treat measurement as the starting point are the ones who can prove results later.
Step 2: Get the exact buyer prompts they will track you against
Prompts are the new keywords. The agency should hand you a named set of buyer prompts, the actual questions your customers type into ChatGPT and Perplexity, that they will measure you on every week. Vague phrases like "AI search visibility" cannot be tracked. A specific prompt like "best invoicing software for freelancers" can.
Ask to see the prompt list and how they built it. A good answer clusters prompts by funnel stage and buyer intent. A weak answer is a generic keyword list with a question mark added. The named prompt set becomes the spine of the entire engagement.
Step 3: Confirm coverage across all five major AI surfaces
Your buyers do not all use the same assistant, so your agency cannot either. Confirm that tracking spans ChatGPT, Claude, Perplexity, Google AI Overviews, and Copilot in one view, not five disconnected screenshots. The point of multi-engine coverage is to catch the surface where a competitor is winning and you are absent.
This is also where you test whether the agency understands engine differences. Claude tends to cite older, more established sources than ChatGPT. Perplexity leans on different domains again. An operator can explain those differences. A reseller treats all five as one ChatGPT-shaped target.
Get a baseline read before you hire anyone
Cite runs a one-week diagnostic that benchmarks your citation share across ChatGPT, Claude, Perplexity, Google AI Overviews, and Copilot, names the buyer prompts you are losing, and hands you a ranked rebuild list. Use it to vet us, or to vet whoever else is pitching you.
Book a Discovery CallStep 4: Put the deliverable in the SOW as an inspectable artifact
The single best defense against a loosely defined offering is a deliverable written into the statement of work in concrete terms. Not "ongoing optimization." Write down the artifact: a weekly citation-share figure, a ranked list of lost prompts, and a rebuild log showing what changed and why.
If you can inspect the artifact, you can hold someone accountable to it. If the SOW only contains adjectives, you have signed up for the 37% of offerings that cannot be measured. The deliverable definition is also your filter when comparing two agencies that quote the same price.
Step 5: Check that the work produces a weekly decision, not a monthly report
A mature GEO program runs on a weekly heartbeat: review the numbers, decide what to rebuild, ship the change. Ask the agency what happens each week and who decides. If the rhythm is monthly and the output is a deck, the program has no pulse between reports.
The decision is the product. A weekly cadence catches citation drift before it compounds, where a monthly one lets a lost prompt sit for three weeks before anyone notices. We mapped what good measurement looks like in how to measure GEO and AI visibility.
Step 6: Pin down how they attribute results
This is the question that exposes the resellers. Ask how the agency tells the difference between a citation change they caused and one that happened anyway. A credible answer is honest about the limits: AI referral traffic is small, attribution is hard, and most lift figures are directional rather than proven.
An honest attribution answer admits what it cannot prove yet.
Be suspicious of anyone who quotes a precise revenue figure from GEO this early. The defensible metric is citation share against a named prompt set over time, tracked the way you would track share of voice in AI search. If an agency claims clean revenue attribution from AI answers, ask to see the method, then watch them improvise.
Step 7: Set a 90-day exit checkpoint with a named metric
Never sign a 12-month lock-in with no definition of success. Set a 90-day checkpoint with one named metric agreed up front, usually citation share across your priority prompts, and a clean exit if the number has not moved. Ninety days is enough time for content changes to show up in AI answers and short enough to limit your exposure to a bad fit.
The exit clause does two things. It protects your budget, and it tells you whether the agency believes its own pitch. An operator confident in the method will agree to a measurable checkpoint. A reseller will fight for the lock-in.
What a real GEO engagement should cost
GEO pricing in 2026 mostly follows two shapes. A project engagement (a one-time audit, baseline, and rebuild plan) typically runs a few thousand dollars and answers "where do we stand and what do we fix." A retainer (ongoing measurement, weekly decisions, and content rebuilds) is priced like a managed marketing service, scaled to how many prompts and pages you are defending.
What you are paying for is a function with an owner, not a tool subscription. A dashboard reports the number for a license fee. It does not decide which page to rebuild next week or defend the line item in your next budget review. If an agency's price is close to a tool's price, you are probably buying a tool with a human forwarding the export. The teams allocating real budget to GEO, above roughly 5% of marketing spend per the Conductor 2026 State of AEO/GEO report, are buying the owned function, not the export.
One more cost to weigh: the build-versus-buy decision. If you have someone who can own GEO as their primary number, you may not need an agency at all. We covered that side in who owns GEO at your B2B company. If nobody internally can take the weekly decision, a managed engagement is the cheaper path to a real program.
FAQ
How do I know if a GEO agency is legit?
Ask for three things before you sign: a baseline citation-share read on your own brand across the major AI engines, the named buyer prompts they will track you against, and the weekly deliverable written into the SOW. A legitimate agency produces all three in the sales process. One of the 37% of loosely defined offerings answers "we optimize your content for AI search" and shows a ChatGPT screenshot.
What questions should I ask a GEO agency?
Ask what baseline they will measure before starting, which prompts and which engines they track, what artifact you receive each week, how they attribute a result to their work, and what the 90-day exit checkpoint metric is. The strength of the answers, specific prompts and a named deliverable versus adjectives, separates an operator from a reseller.
How much does a GEO agency cost in 2026?
A one-time audit and rebuild plan typically runs a few thousand dollars. An ongoing retainer is priced like a managed marketing service, scaled to the number of prompts and pages you are defending. The honest test is whether the price buys a function with a weekly decision and an owner, or just a report you could license from a tool for less.
Is a GEO agency different from an SEO agency?
The work overlaps but the metrics differ. SEO targets keyword rankings and clicks; GEO targets citation share and recommendation rate inside AI answers. An SEO agency that simply renamed its retainer will still measure rankings and clicks. A real GEO agency measures whether you appear in the synthesized answer across multiple engines. The 88% who now claim GEO include many of the former.
Should I hire a GEO agency or build the capability in-house?
If one person can own AI citation share as their primary number and run a weekly review, you can build it in-house. Fewer than 15% of B2B companies have that owner today, so for most teams a managed engagement is the faster path. Either way, the requirement is the same: a baseline, a named prompt set, a weekly decision, and one number reported to leadership.
Make us prove it before you commit
Cite acts as your GEO function: a measured baseline across every major AI engine, a named prompt set, weekly rebuild decisions, and one share-of-voice number for your leadership. Start with the diagnostic and judge us on the read.
Book a Discovery CallThe bottom line
Vetting a GEO agency comes down to one move: refuse to buy adjectives. Make every shortlisted vendor show a baseline on your brand, name the prompts they will track, and write the weekly deliverable into the contract. The agencies that can do all three are the operators. The ones who answer with "we optimize for AI search" and a screenshot are the 37%.
The category is young and the supply is noisy, but the test is simple. If the engagement produces a decision every week instead of a report every month, you hired a function. If it does not, you hired a label. Run the three checks before your next call, and the shortlist sorts itself.
Continue the brief
Who Owns GEO at Your B2B Company?
New B2B research: 92% of teams now do GEO, but fewer than 15% have a dedicated owner and 37% of agency GEO offerings are loosely defined.
What Does a Mature AEO Program Look Like in 2026?
Conductor surveyed 250+ enterprise marketing leaders. 51% run AEO on integrated platforms, and high-maturity teams are 6x more likely to. Here is the bar.
How to Measure AI Citation Absorption
Citation count tells you if AI picked your page. Citation absorption tells you if your language reached the answer. Here is how to measure both.
Framework
Learn the CITE framework behind our GEO and AEO work
See how Comprehend, Influence, Track, and Evolve turn AI visibility into an operating system.
Services
Explore our managed GEO services and AEO execution model
Audit, prompt discovery, content execution, and ongoing monitoring tied to AI search outcomes.
Audit
Start with an AI visibility audit before execution
Understand prompt coverage, recommendation gaps, source mix, and where competitors are winning.
