ChatGPT for Doctors: What It Gets Right, What It Gets Wrong

Over a quarter of UK doctors have used some form of AI in their practice — and many of them are using ChatGPT. The GMC's commissioned research confirms it. A 2025 survey of 1,005 UK GPs found that 11% were encouraged by their employer to use generative AI tools at work, while only 3.5% were prohibited. The majority exist in an unregulated middle ground: using AI informally, without guidance, and without governance.

The question is not whether doctors should use ChatGPT. They already are. The question is what it gets right, what it gets dangerously wrong, and when purpose-built tools are the safer choice.

What ChatGPT Gets Right

Medical knowledge breadth. ChatGPT's training data includes an enormous volume of medical literature, textbooks, and clinical content. It can generate plausible differential diagnoses, explain pathophysiology clearly, summarise management pathways, and discuss clinical evidence with apparent fluency. For educational contexts — explaining a concept to a medical student, brainstorming differentials for a complex case — this breadth is genuinely useful.

Administrative tasks. Drafting referral letters, composing patient information leaflets, structuring teaching presentations, summarising meeting notes, writing reflective CPD entries. These non-clinical tasks are where ChatGPT excels without clinical risk. The output requires editing but the first draft saves significant time.

Reasoning on common scenarios. For well-established clinical scenarios with clear, unambiguous evidence, ChatGPT's clinical reasoning can be impressive. It can walk through a diagnostic workup, explain why certain investigations are indicated, and present management options in a structured way.

What ChatGPT Gets Wrong

Drug dosages. ChatGPT generates plausible-sounding doses that may be incorrect. It does not check the BNF. It does not verify against any pharmacopoeia. A hallucinated dose — methotrexate 25mg daily instead of weekly, for example — can directly harm a patient. This is not a rare edge case; dosing errors in LLM outputs are well documented.

References. Published research shows ChatGPT fabricates journal citations with alarming frequency. Studies report that over 45% of AI-generated references had fabricated DOIs, author names, or publication dates. The references look real — proper journal name, plausible title, realistic formatting. They are fiction. If you cite an AI-generated reference in a clinical document, you are citing something that does not exist.

UK vs US guidelines. ChatGPT does not reliably distinguish between UK and US clinical practice. A UK GP asking about statin prescribing may receive an answer based on ACC/AHA guidelines rather than NICE. The model does not flag the jurisdictional mismatch. For a clinician whose audit, appraisal, and medico-legal defence depend on following UK guidance, this is a serious limitation.

Confidence calibration. ChatGPT sounds equally confident whether it is correct or wrong. There is no uncertainty signal — no "I'm not sure about this" qualifier when the model is generating from weak training signal. The fluency of the language does not correlate with the accuracy of the content. This is perhaps the most dangerous feature for clinical use.

Error amplification. Research published in Nature Communications Medicine found that when clinical vignettes contained planted errors, leading LLMs repeated or elaborated on the error in up to 83% of cases. ChatGPT does not detect clinical inconsistencies — it amplifies them. If you provide incorrect information in your prompt, the AI will build a confident, articulate response on top of that incorrect foundation.

Why Purpose-Built Medical AI Is Architecturally Safer

The difference between ChatGPT and a tool like iatroX is not cosmetic — it is architectural.

ChatGPT generates text from statistical patterns learned during training. It does not retrieve from any specific source when answering your question. It predicts what text should follow your prompt based on patterns in its training data — which may include authoritative guidelines, but also outdated textbooks, US-centric protocols, patient forum posts, and medical misinformation.

iatroX uses retrieval-augmented generation (RAG) over a curated corpus of NICE, CKS, SIGN, and BNF guidelines. When you ask a question, the system retrieves relevant content from verified sources, synthesises an answer, and shows you exactly where it came from. The citation links to the actual guideline section. You can verify in one click.

iatroX is UKCA-marked and MHRA-registered. ChatGPT is not a medical device and was never designed to be one. The RCP's 2026 report on digital and AI explicitly notes that tools like ChatGPT "are not regulated for use in healthcare" and that clinicians should not just defer to an AI's output.

The Practical Framework for UK Clinicians

Use ChatGPT for: Administrative writing, teaching preparation, communication drafting, brainstorming. Tasks where the output will be reviewed and edited before use, and where clinical accuracy is not directly at stake.

Use iatroX for: Clinical questions, guideline retrieval, prescribing verification, referral criteria, management pathways. Any situation where the accuracy of the medical information directly affects patient care.

Use the BNF for: Every prescribing decision. No AI tool — whether general-purpose or purpose-built — should be the sole basis for a dose or interaction check.

Never use ChatGPT as: Your sole source for a clinical decision. Every clinical output must be independently verified against an authoritative source.

Conclusion

ChatGPT is a remarkable technology. It is not a clinical tool. The gap between "impressively articulate" and "clinically safe" is where patient harm occurs. Purpose-built medical AI like iatroX closes that gap by grounding every answer in verified guidelines, showing every source, and operating within a regulatory framework designed for clinical use.

Use the right tool for the right job. Your patients are counting on the difference.