Executive summary
The clinical AI landscape is maturing at an incredible pace. In August 2025, in a move that signals a major market consolidation, Doximity acquired Pathway Medical to bolster its clinical reference capabilities, integrating a knowledge graph that scored approximately 96% on a USMLE benchmark. This comes as OpenEvidence, a fast-growing medical search platform, continues to expand its footprint, and a new generation of specialised tools like DxGPT and the UK-centric iatroX are gaining traction (Fierce Healthcare, OpenEvidence).
At the same time, powerful generic models like ChatGPT-5, Google's Gemini, and Perplexity are more accessible than ever. For the UK clinician, this raises a critical question: which type of tool provides the most accurate, trustworthy, and safe support for patient care? This article provides a head-to-head comparison across the key dimensions of accuracy, evidence sourcing, UK guideline alignment, and usability, offering a clear framework for choosing the right AI assistant for the right clinical task.
Background & recent developments
The market is no longer just about standalone apps; it's about integrated, intelligent platforms. The acquisition of Pathway Medical by Doximity is a clear indicator of this trend. The deal brings Pathway's highly structured dataset and clinical reference tools directly into the Doximity ecosystem, sitting alongside its AI scribe and administrative assistant, DoximityGPT (Fierce Healthcare, investors.doximity.com).
Meanwhile, OpenEvidence continues its rapid growth, backed by significant funding and partnerships with major medical publishers, cementing its position as a leading AI-powered evidence synthesis tool for clinicians. This is happening alongside the rise of more specialised tools like Glass Health (for differential diagnosis) and DxGPT (for rare diseases), and UK-focused platforms like iatroX, which are designed to align with NHS workflows and national guidelines.
What metrics matter when evaluating medical AI tools
To make a meaningful comparison, we need to define what "good" looks like for a clinical AI tool. The key metrics are:
- Clinical accuracy: How well does it perform on recognised clinical benchmarks (e.g., USMLE, UK postgraduate exams)?
- Provenance / evidence sourcing: Does it cite its sources? Are those sources high-quality, peer-reviewed literature and national guidelines, or just the general internet?
- Guideline alignment: How relevant is the information to UK practice? Does it reference NICE, CKS, or SIGN guidelines?
- Domain specificity: How well does it handle nuance in specific areas like rare diseases, paediatrics, or complex drug interactions?
- User trust & safety: Can the tool abstain from answering when it's uncertain? Does it show its reasoning? Is it compliant with UK regulations?
- Usability & workflow fit: How fast is it? Is it available on mobile? Does it reduce or add to a clinician's workload?
- Cost & access: Is it free, freemium, or subscription-based? Is it accessible to individual clinicians in the UK?
Comparative tool-profiles
Tool | Key Features & Target Audience | What It Does Well (Proof / Claims) | Limitations / Areas to Check |
---|---|---|---|
DoximityGPT (with Pathway) | Clinician reference + admin workflow, integrated with Pathway’s knowledge graph. | Strong structured data; high benchmark scores (~96% on USMLE); fast; free access for many US physicians. | U.S. centric; UK guideline alignment is not guaranteed; requires gold-standard verification. |
OpenEvidence | Medical search / decision support; evidence synthesis for clinicians. | Deep literature base; peer-reviewed sourcing; fast access to recent studies; partnerships with major publishers. | Potential cost; may vary in UK guideline relevance; can be complex for quick, simple decisions. |
DxGPT | Differential diagnosis focused; rare disease support for academic/pilot use. | Provides ranked differentials to counter cognitive bias; strong in complex paediatric/rare disease cases. | High prompt sensitivity; not yet a regulated or widely integrated tool; safety disclaimers are crucial. |
Glass Health (Glass AI) | Supports differential diagnosis and clinical plan drafting; reasoning process is a key feature. | Good for structuring clinical reasoning and drafting management plans; provides an explanatory trail. | Primarily subscription-based; may not be aligned with UK guidelines for every recommendation. |
iatroX | UK-centric guideline alignment, free access, integrated Ask (Q&A), Quiz (exam prep), and Brainstorm (differentials). | Fast, cited answers in a UK context; built for UK exam prep; includes learning and reasoning support; free for all users. | Newer in the market; quality of differential generation needs continuous validation against real-world use. |
Generic Models (ChatGPT, Gemini, Perplexity) | Broad language understanding; general tasks, patient communication, writing assistance. | Very flexible; huge knowledge base; excellent for summarisation and language tasks with good prompt engineering. | High risk of hallucination; not always up-to-date; no guaranteed UK guideline alignment; not designed for clinical safety. |
Empirical evidence: what studies show
- A 2025 comparison study looking at OpenEvidence, ChatGPT, and another tool in a dental implant context found that OpenEvidence performed best on many of the technical and patient-focused questions, highlighting the advantage of a medically-trained model (PMC).
- Pathway (now part of Doximity) has publicly claimed that its models can achieve a performance of approximately 96% on the USMLE benchmark, demonstrating a high level of accuracy on standardised knowledge tests.
How generic tools compare: strength & risk trade-offs
The key trade-off with generic tools like ChatGPT, Gemini, and Perplexity is flexibility versus reliability.
- Strengths: They are incredibly flexible, have a familiar interface, and are often free or low-cost. They can be excellent for non-clinical tasks like drafting emails or summarising non-medical texts.
- Risks: For clinical questions, the risks are significant. They have no obligation to cite their sources, their information may be outdated, they have a well-documented risk of "hallucinating" plausible but incorrect information, and their default is almost always US-centric.
What clinicians should prioritize when choosing a tool
- Does it cite its sources? This is non-negotiable. You must be able to verify the information.
- Is the knowledge base curated and updated? Is it drawing from trusted sources like national guidelines?
- Is it aligned with UK practice? Does it reference NICE and CKS?
- Is it transparent about uncertainty? A safe tool knows when to say "I don't know."
- How does it fit my workflow? Is it fast, mobile-friendly, and easy to use under pressure?
- Is it compliant? Does it meet UK data protection and regulatory standards?
Use cases: when to use a specialised tool vs a generic model
- For an urgent, point-of-care guideline lookup: Use a specialised tool like iatroX that is designed for this task.
- For UK-relevant exam revision: Use a UK-centric tool like iatroX or a platform with a specific UK question bank.
- For a complex rare disease differential: A specialised tool like DxGPT or Glass Health can be a useful brainstorming partner.
- For non-clinical tasks (drafting a presentation, summarising a non-medical article): A generic tool is often perfectly sufficient, provided you verify any facts.
Conclusion & Recommendations
The evidence is clear: for high-stakes clinical work, specialised medical AI tools like DoximityGPT, OpenEvidence, and iatroX are increasingly outperforming general models in accuracy, reliability, and safety. This is because they are built on a foundation of curated, trusted evidence and are designed with the specific needs of clinicians in mind.
While generic models are not useless—they have a role in non-clinical tasks—our key recommendation for UK clinicians is to adopt a "provenance-first" approach. Prioritise tools that provide clear, verifiable citations back to authoritative sources like national guidelines. This is the only way to harness the power of AI while upholding your professional duty to provide safe, effective, and evidence-based care.