ChatGPT for Doctors: What It Gets Wrong (and What Purpose-Built Medical AI Gets Right)

Doctors use ChatGPT. GMC research confirms it. Reddit is full of clinicians sharing their experiences. And the appeal is obvious: you type a question in plain English and get an articulate, detailed, apparently knowledgeable response in seconds.

The problem is that "apparently knowledgeable" is not the same as "clinically reliable." ChatGPT's failures in medicine are not occasional glitches — they are structural features of how the technology works. Understanding them is the difference between using AI safely and trusting AI dangerously.

What ChatGPT Gets Wrong

Drug dosages. ChatGPT generates plausible-sounding dosages that may be incorrect. It does not check the BNF. It does not verify against any pharmacopoeia. A hallucinated dose — methotrexate 25mg daily instead of weekly, for example — can directly harm a patient.

References. Published research shows that ChatGPT fabricates journal citations with alarming frequency. Over 45% of AI-generated references in one study had fabricated DOIs, author names, or publication dates. The references look real. They are fiction.

Jurisdictional accuracy. ChatGPT does not reliably distinguish between UK and US clinical practice. A UK GP asking about statin prescribing may receive an answer based on US ACC/AHA guidelines rather than NICE. The model does not flag the mismatch.

Clinical reasoning under ambiguity. Research in Nature Communications Medicine showed that when clinical vignettes contained a single planted error, leading LLMs repeated or elaborated on the error in up to 83% of cases. ChatGPT does not detect clinical inconsistencies — it amplifies them.

Confidence calibration. ChatGPT sounds equally confident whether it is correct or wrong. There is no uncertainty signal. This is perhaps the most dangerous feature for clinical use — the fluency of the language does not correlate with the accuracy of the content.

What Purpose-Built Medical AI Gets Right

The difference is architectural, not cosmetic.

iatroX uses retrieval-augmented generation over a curated corpus of NICE, CKS, SIGN, and BNF guidelines. When you ask a clinical question, the system retrieves relevant content from verified sources, synthesises an answer, and shows you where it came from. The citation links to the actual guideline. You can click through and verify in seconds.

This is fundamentally different from ChatGPT's approach, which generates text from statistical patterns learned during training, without retrieving from or checking against any specific source.

The result: iatroX's answers are grounded in the sources that govern UK clinical practice. ChatGPT's answers are grounded in whatever patterns its training data produced — which may include outdated textbooks, US guidelines, patient forum posts, and medical misinformation alongside legitimate clinical content.

iatroX is also UKCA-marked and MHRA-registered. ChatGPT is not a medical device and was never designed to be one.

The Practical Framework

Use ChatGPT for: Administrative writing, teaching preparation, communication drafting, brainstorming. Tasks where the output will be reviewed and edited before use, and where clinical accuracy is not directly at stake.

Use iatroX for: Clinical questions, guideline retrieval, prescribing verification, referral criteria, management pathways — any situation where the accuracy of the medical information directly affects patient care.

Use the BNF directly for: Every prescribing decision. No AI tool should be the sole basis for a dose or interaction check.

Conclusion

ChatGPT is a remarkable technology. It is not a clinical tool. The gap between "impressively articulate" and "clinically safe" is where patient harm occurs.

Purpose-built medical AI like iatroX closes that gap by grounding every answer in verified guidelines, showing every source, and operating within a regulatory framework designed for clinical use. Use the right tool for the right job. Your patients are counting on the difference.