ChatGPT vs. iatroX vs. OpenEvidence: which free AI is actually safe for clinical decisions?

Introduction

We are past the point of asking if doctors should use AI. You probably already are. You might use ChatGPT to draft a difficult email to a colleague or to polish a discharge summary. But the moment you type a clinical question—like "dose of gentamicin for 60kg female with eGFR 35"—you enter a safety minefield.

In 2025, the market has split into three distinct categories: the Generalist, the Academic, and the Clinical Co-Pilot. This article compares the three heavyweights—ChatGPT, OpenEvidence, and iatroX—to answer one question: which one is actually safe for making decisions about patients?

1. The "generalist" problem: ChatGPT & Grok

The Pro: Incredible fluency. ChatGPT is a master of language. If you need to rewrite a patient letter to sound "more empathetic" or draft a complaint response, it is unbeatable. It understands tone and context better than any dedicated medical tool. The Con: "Hallucination Roulette." ChatGPT is a probabilistic engine. It predicts the next likely word, not the next true fact. It doesn't "know" the NICE guideline for hypertension; it knows what the internet generally says about hypertension. This means it can confidently invent a drug dose or cite a guideline that doesn't exist. The Verdict: Use for communication, not calculation.

2. The "academic" heavyweight: OpenEvidence

The Pro: It’s the "Google Scholar" of AI. OpenEvidence is rigorous. It reads 100% peer-reviewed papers and provides answers that are deeply grounded in the medical literature. It rarely lies because it is constrained to high-quality data. The Con: It can be too dense for the ward. When you ask for a first-line antibiotic, you often get a literature review of three different trials rather than a simple "Amoxicillin 500mg TDS." Crucially for UK clinicians, it is often US-centric, prioritising FDA approvals and American guidelines over NICE or the BNF. The Verdict: Use for deep research and complex, rare cases where standard guidelines don't apply.

3. The "clinical co-pilot": iatroX

The Pro: The "Grounded" Middle Way. iatroX is designed to sit between the fluency of ChatGPT and the rigidity of OpenEvidence.

UK-First: Unlike ChatGPT, it doesn't just "know" medicine; it retrieves specific UK guidance (NICE/CKS/SmPC).
Ward-Ready: Unlike OpenEvidence, it answers in "Bullet Points" designed for a 2-minute corridor decision, not a 20-minute library session. The Killer Feature: Automatic CPD. iatroX is the only tool in this list that automatically logs your query as a CPD entry. It turns your daily curiosity into evidence for your appraisal, saving you hours of admin time. The Verdict: The daily driver for the ward.

Summary table: the safety traffic light

Task	ChatGPT / Grok	OpenEvidence	iatroX
Writing Letters / Emails	🟢 Best	🔴 Too Dry	🟡 Good (but structured)
Checking Doses (BNF)	🔴 Unsafe (Hallucination Risk)	🟡 Safe (but US units possible)	🟢 Safe (Links to BNF)
Researching Rare Diseases	🟡 Good for ideas	🟢 Best (Deep Lit Search)	🟡 Good (Guideline focus)
UK Guideline Check (NICE)	🔴 Unreliable	🟡 Variable (US bias)	🟢 Best (Native Integration)
Logging CPD	🔴 No	🔴 No	🟢 Automatic

Conclusion

If you want to write a poem about cardiology, use ChatGPT. If you want to know the latest trial data on a rare lymphoma, use OpenEvidence. If you want to know what to prescribe for a UTI in a pregnant patient at 3 AM in a UK hospital—and get CPD points for checking—use iatroX.