Methodology · v1.0
How The CITE Index measures AI search visibility, daily, across India's most-funded consumer categories.
Everything below is the operating manual for the study. It is deliberately short and deliberately specific. If a question you have is not answered here, email research@cite.solutions and we will either answer it or fix this page.
01 · Prompts
50 unaided buyer prompts, per vertical
For each of the 10 verticals we track, we run a fixed set of 50 prompts. Every prompt is written from the buyer's side of the conversation — the kind of question a real Indian shopper would type into ChatGPT before a purchase. Branded prompts are forbidden: no prompt names a brand inside the question, because the moment you do, you've handed the model the answer and corrupted the share-of-voice math.
Prompts are versioned per vertical. When we add or replace one, the change is published with the edition that introduced it, and the old prompt remains in the archive so historical comparisons stay honest. The full prompt set for every vertical is visible on each vertical's daily dashboard.
02 · Engines
Three engines, every night, identical inputs
ChatGPT
DataForSEO LLM scraper · web search on
/v3/ai_optimization/chat_gpt/llm_scraper/live/advanced
Gemini
DataForSEO LLM scraper
/v3/ai_optimization/gemini/llm_scraper/live/advanced
AI Mode
Google AI Overviews · AI Mode SERP
/v3/serp/google/ai_mode/live/advanced
Same prompts. Same location (India, code 2356). Same language (English). The underlying model on each surface is whatever the engine's default is on the day the prompts run — we don't pin model versions, because the goal is to measure what a real Indian buyer would actually see, not what a synthetic benchmark sees. When an engine updates its model under us, that's a real signal too, and the daily editor's note flags it.
Every run goes out between 21:00 and 21:30 IST. The cadence is deliberate: late enough that the day's news cycle is in, early enough that the publication slot lands before the next morning's newsletter sweeps.
03 · Metrics
Three primary metrics, one rule
Share of voice (SOV)
brand_mentions ÷ total_brand_mentions_in_vertical × 100
Percent of all brand mentions across the 50 prompts × 3 engines that named this brand. Self-mentions on the brand's own domain are excluded so a brand can't earn SOV by being cited in its own footer.
Source pool
distinct domains cited, weighted by appearance per engine
Every URL the engines cite is parsed to its registrable domain. We surface the top domains a buyer is being routed to before they ever reach a brand site — the upstream that shapes brand perception.
Sentiment
positive / neutral / cautious / negative — per brand mention
Each mention is classified by a deterministic rules-and-lexicon pass plus a Claude sentinel for ambiguous cases. The four buckets are stacked to show composition, not averaged, because a 60% positive / 40% cautious mention is a different signal than 100% neutral.
The one rule above all of them: a metric only enters the dashboard when we can show our work for it. If we can't cite the prompt + the answer + the URL that produced the number, the number doesn't ship.
04 · Movement
Day-over-day deltas surface the story
Every brand's SOV, rank, and source-pool position is compared to its prior edition. Any movement over 2 percentage points triggers an editorial review. That review is what becomes the daily editor's note — "what changed, and what it means" — published with each edition.
When the magnitude of movement crosses bigger thresholds — a leader change, an all-time high or low, an engine that flips its #1, a sentiment composition inversion — we publish a permanent Finding at a dedicated URL. Findings are dated, citable, and never edited after publication.
05 · Data & provenance
Everything we publish links back to its source
The raw prompt receipts — the literal answer text from each engine for each prompt — are stored alongside the metrics that derive from them. Every published claim on a daily edition links back to either the engine answer or the cited URL that produced it. Nothing on the dashboard is a synthesis we can't replay from raw data.
The data is stored in Postgres (Supabase). The ingest pipeline is open-source and the schema is published with the codebase. If you find an error, file an issue and we'll publish a correction in the next edition with a permanent note appended to the affected day.
06 · Limits
What this study is, and what it is not
This is not a survey. We're not asking buyers what they think. We're measuring what generative engines say when a buyer asks them. Those two things correlate, but they're not the same.
This is also not a brand-equity index. A brand can score high on SOV in ChatGPT and still be losing real customers — because ChatGPT's answer set is shaped by training data and cited sources, not by who is winning shelf space this quarter. The CITE Index measures the AI search layer specifically. Read it alongside, not instead of, your other instrumentation.
Licence & citation
Free to quote. Free to remix. Just link back.
CC BY 4.0
All editions, findings, and methodology pages are published under Creative Commons Attribution 4.0. Quote freely. The attribution requirement is a link back to the specific edition or finding URL you cited, plus the phrase "The CITE Index · cite.solutions."
Full attribution terms →Suggested citation
Cite Solutions Research. (2026). The CITE Index · India Edition. cite.solutions/state-of-ai-india.