Clinical AI has entered its adoption-claim era. Companies now report large numbers of queries, consultations, clinician users, and time saved. These figures are useful because they show real demand — clinicians are clearly trying these tools and, in many cases, returning to them regularly. But in medicine, usage volume is not the same as clinical reliability. A tool can be widely tried, heavily searched, and still require careful evaluation of sources, workflow fit, and verification before a clinician should rely on it for patient care.
iatroX has processed hundreds of thousands of clinician queries and interactions. Heidi Evidence reports nearly 2 million Evidence queries from clinicians across Heidi markets since launch. Heidi also says its wider platform supports over 2.4 million consultations every week. OpenEvidence reported approximately 18 million verified-physician consultations in December 2025 alone. These are strong signals that clinicians want AI-supported clinical knowledge tools — but they should be interpreted as adoption signals rather than complete measures of accuracy, trust, or clinical impact.
What a Query Count Can Tell You
Query counts are genuine evidence of demand. When hundreds of thousands or millions of clinical questions are processed through an AI platform, that demonstrates that clinicians have a real, recurring need for faster clinical information retrieval. The behaviour is not hypothetical — it is happening at scale, across multiple platforms, in multiple countries.
High query volumes can also indicate that the tool is fast enough, accessible enough, and useful enough that clinicians return after the first use. A tool that generates millions of queries is not merely benefiting from a one-time novelty effect — it is earning repeat interactions, which suggests functional value.
For the market as a whole, aggregate query volumes across platforms (iatroX, Heidi Evidence, OpenEvidence, ChatGPT health queries, UpToDate searches) confirm that clinical information retrieval is becoming a significant software category — not a niche experiment.
What a Query Count Cannot Tell You
It does not tell you how many unique clinicians are using the tool. A lifetime query count of 2 million could represent 200,000 clinicians asking 10 questions each, or 20,000 clinicians asking 100 questions each, or 2,000 power users asking 1,000 questions each. The distribution matters — widespread but shallow adoption is different from narrow but deep adoption, and both are different from the claim that millions of clinicians rely on the tool daily.
It does not tell you whether the answers are accurate. Query volume measures input (questions asked), not output quality (answers given correctly). A tool could process millions of queries while providing subtly incorrect, outdated, or jurisdiction-inappropriate answers to a meaningful percentage of them. Usage does not equal accuracy.
It does not tell you whether clinicians verified the answers. A query count does not distinguish between a clinician who received an answer, checked the citation, verified the source, and applied the information — and a clinician who received an answer and acted on it without verification. The safety implications of these two behaviours are very different.
It does not tell you whether clinicians returned. Lifetime query counts accumulate over time. They do not tell you what this month's active usage looks like, whether retention is improving or declining, or whether the initial adoption wave has been sustained. A tool could report impressive lifetime numbers while experiencing declining monthly engagement.
It does not tell you whether the answers fit the clinician's jurisdiction. A clinician in the UK receiving a US-guideline-aligned answer has had their query counted — but the clinical value of that interaction may be negative if they act on guidance that does not apply to UK practice.
The Difference Between Trying a Tool and Relying on It
Adoption has stages. Awareness — the clinician hears about the tool. Trial — the clinician uses it once or twice. Evaluation — the clinician tests it against their existing workflow and reference sources. Integration — the clinician incorporates it into daily practice. Reliance — the clinician trusts it enough to use it for consequential clinical decisions.
Query counts primarily measure trial and early evaluation. They do not necessarily measure integration or reliance. A clinician who tried a tool three times in January and never returned still contributed three queries to the lifetime total. A clinician who uses the tool 15 times per day, every working day, for six months contributes thousands — but is a fundamentally different type of user.
For clinical AI tools, the metrics that matter most are the ones that indicate reliance: daily active users (not lifetime registered users), repeat usage patterns (not one-time trials), source verification behaviour (do clinicians click through to citations?), and workflow integration (does the tool fit into the clinical day, or is it used only in idle moments?).
Why Sources Matter More in Clinical AI Than in General Search
When a consumer searches Google for "best Italian restaurants near me," the stakes of an inaccurate result are low — a mediocre meal, a wasted evening. When a clinician uses an AI tool to check a drug dose, verify a contraindication, or confirm a management pathway, the stakes are directly clinical. An incorrect answer may lead to a prescribing error, a missed diagnosis, or a suboptimal management plan that affects a real patient.
This asymmetry means that source quality in clinical AI matters far more than in general search. The relevant questions are: What sources does the tool draw from? Are they authoritative for the clinician's jurisdiction? How current are they? Are the citations specific enough to verify in seconds? Does the tool distinguish between strong and weak evidence?
A tool that generates 2 million queries from a broad, unspecified evidence base is a different proposition from a tool that generates hundreds of thousands of queries grounded in specific national guidelines with verifiable citations. Both numbers are useful adoption signals. But the clinical reliability of the output depends on the sources underneath — not on the volume of questions processed.
Jurisdiction Matters: UK Guidance, US Evidence and Local Pathways
A clinical AI tool built around PubMed and US medical literature will produce different outputs for UK clinical queries than a tool built around NICE, CKS, BNF, and SIGN. The drug names may differ. The screening thresholds may differ. The first-line treatment choices may differ. The referral pathways certainly differ.
For UK clinicians, jurisdictional relevance is not a secondary consideration — it is a primary one. A tool that returns a well-cited answer from the wrong guideline framework has provided an accurate answer to the wrong question. This is not a trivial distinction when the answer informs a prescribing decision or a referral.
Query counts do not capture this dimension. A tool could process millions of queries from UK clinicians while drawing primarily from US evidence — and every one of those queries would count toward the lifetime total, regardless of whether the answer was UK-relevant.
Seven Better Questions Clinicians Should Ask
When evaluating any clinical AI tool — beyond the headline adoption numbers — ask these questions.
How many clinicians use this tool daily, not just lifetime? Daily active users indicate current relevance. Lifetime registered users indicate historical interest.
What sources underpin the answers? Peer-reviewed literature? National guidelines? Local Trust policies? Is the source base disclosed and verifiable?
Are citations specific enough to check in seconds? A link to the exact NICE guideline paragraph is useful. A vague reference to "medical literature" is not.
Does the tool fit my healthcare system? Does it cite UK guidelines for UK queries? Or does it default to US or international recommendations?
Can I see the tool's limitations? Does it acknowledge uncertainty? Decline to answer when evidence is insufficient? Flag when a query falls outside its scope?
What is the retention pattern? Are clinicians returning daily, weekly, or only once? Retention indicates genuine workflow value — trial does not.
Does the tool support my whole clinical knowledge workflow? Clinical questions, calculators, exam preparation, brainstorming, and CPD — or only one narrow function?
Where iatroX Fits
iatroX is built around a different centre of gravity from pure query volume: the clinical knowledge workflow. It brings together guideline-grounded answers via Ask iatroX, clinical brainstorming, 80+ calculators with editorial content and guideline references, 15+ adaptive exam Q-banks, and CPD-style learning.
That means adoption should not only be measured in questions asked, but also in how often clinicians use the platform to clarify a decision, revise a topic, calculate a score, document learning, or return to a source-linked answer. The hundreds of thousands of interactions processed reflect this breadth — spanning clinical queries, exam practice, calculator usage, brainstorming, and guideline retrieval across the same platform.
Core clinical information-retrieval and brainstorming workflows are accessible. Exam-preparation products may include paid components depending on the exam and region.
Bottom Line
Large query counts are real evidence of real demand. Clinicians clearly want fast, cited, AI-assisted clinical information retrieval. But query counts are adoption signals, not trust signals. The more meaningful questions — source quality, jurisdictional relevance, citation verifiability, retention patterns, and workflow fit — require looking beyond the headline numbers.
Doctors do not need a chatbot that sounds confident. They need answers they can check, from sources they trust, in a tool they return to every day.
