RAG for guideline retrieval in UK clinical practice: iatroX, Automedica SmartGuideline, WHO SMART Guidelines, and the MHRA “AI Airlock”

Executive summary

For artificial intelligence to be truly trustworthy in a clinical setting, its answers must be accurate, transparent, and auditable. The most effective and safe architectural pattern for achieving this today is Retrieval-Augmented Generation (RAG). By grounding the outputs of large language models in a verified library of sources, RAG dramatically reduces the risk of factual "hallucinations" and provides clear citations, which are essential for clinical governance and continued learning (Nature, PMC).

The UK is developing a mature ecosystem to support this technology. The move towards machine-readable standards like the WHO SMART Guidelines promises to accelerate the creation of high-quality AI tools. Simultaneously, the MHRA’s AI Airlock provides a regulatory sandbox for innovative AI as a Medical Device (AIaMD) to be tested and refined in a real-world, supervised environment, paving the way for safe and effective deployment (World Health Organization, GOV.UK).

What RAG is—and why it matters for clinical guidelines

Retrieval-Augmented Generation is a "show your work" architecture for AI. Unlike a standard large language model that generates text based only on its internal training, RAG follows a clear, multi-step process: retrieve → re-rank → generate with citations. This simple but powerful design is what makes it uniquely suited for clinical guideline use.

The benefits are clear:

Source traceability: It provides verifiable proof of where the information came from, which is a non-negotiable for clinical decision-making.
Version awareness: It can be designed to surface the publication or review date of a guideline, preventing the use of outdated advice.
Easy updates: When a guideline changes, developers only need to update the source document in the knowledge library, not retrain the entire AI model (PMC, Nature).

The machine-readable guideline opportunity

The quality of any RAG system depends on the quality of its knowledge base. The WHO SMART Guidelines initiative represents a global shift towards making clinical guidelines machine-readable, which is a massive opportunity for developers of RAG guideline retrieval tools. The framework defines multiple levels, with L3 (machine-readable content) and L4 (executable reference software) being key to accelerating the creation of more accurate and interoperable RAG pipelines (World Health Organization, The Lancet). As this standard is adopted for UK pathways, it will become easier to localise AI tools with Trust-specific terminologies and formularies.

Tools landscape for guideline retrieval

Several UK-focused tools are already employing RAG-style architectures to provide clinicians with better information.

iatroX (UK-centred reference & brainstorming)

iatroX is a UKCA-registered information tool that provides evidence-linked Q&A through its Ask iatroX feature and a structured thinking aid with its Brainstorm mode. These features are designed for educational and reference purposes, helping clinicians structure their thoughts and find relevant UK guideline information quickly, rather than providing live, patient-specific advice (iatrox.com).

Automedica – SmartGuideline

Automedica’s SmartGuideline tool is designed to provide "smart-search" of national guidelines with verified, auditable outputs. In a significant regulatory signal, Automedica was selected as part of the first cohort for the MHRA AI Airlock pilot. The MHRA describes its product as a "structured AI model to provide guidelines during a clinical encounter," marking it as a key innovator in the AIaMD space (GOV.UK, Automedica Ltd).

Related ecosystem signals

The MHRA’s AI Airlock itself is a key development. It is a regulatory sandbox that allows innovative AIaMD developers to test their products in a real-world NHS environment under the close supervision of regulators. This allows for faster learning and refinement of both the technology and the regulations that govern it (GOV.UK).

UK rulebook: deploying guideline-aware AI safely

The MHRA provides clear guidance on software as a medical device (SaMD) and AIaMD. Any RAG tool that provides patient-specific recommendations may be classified as a medical device and must adhere to these regulations. The MHRA Airlock guidance provides a pathway for cutting-edge products like Automedica SmartGuideline AI to test and validate their systems in a controlled, real-world setting, helping to accelerate the safe adoption of new technology (GOV.UK, Digital Health).

Reference architecture for RAG-based guideline retrieval

A best-practice clinical RAG system includes:

Corpus: A version-controlled library of NICE guidelines/CKS, local Trust SOPs, and the BNF.
Pipeline: A sophisticated process of ingestion, versioning, and hybrid retrieval using both keyword (BM25) and semantic (dense embeddings) search. This is followed by a clinical re-ranking step before the final, cited answer is generated.
Outputs: The system should provide clinician-facing text with inline citations and document dates, alongside structured JSON for potential integration with clinical decision support systems. Crucially, it must be able to "abstain" from answering if sufficient evidence is not found.

Integration patterns (from pilot to point-of-care)

To be truly effective, these tools must be embedded in the clinical workflow. The goal is to deliver answers into EHR side-panels (e.g., in EMIS or SystmOne) via modern interoperability standards like FHIR CDS Hooks or SMART on FHIR. Any integration must be supported by robust governance, including a Data Protection Impact Assessment (DPIA), a clinical safety case, and diligent post-deployment monitoring.

Quality & safety evaluation (how to measure a RAG tool)

Evaluating a clinical RAG tool requires specific metrics:

Retrieval: Technical metrics like recall@k and nDCG on a guideline-specific test set.
Generation: Measuring faithfulness (is the answer grounded in the retrieved text?), citation accuracy, and the tool's refusal rate when asked questions outside its corpus (PMC).
Operational: Real-world KPIs like time-to-answer, guideline concordance rates, and user trust scores.

Case-style vignettes

Primary care: A GP asks about the local antibiotic policy for UTIs. The RAG tool surfaces the Trust's guidance alongside the national NICE CKS advice, highlighting any differences.
Acute care: A surgeon queries peri-operative anticoagulation protocols. The tool presents the official Trust SOP and the relevant NICE guideline, with explicit version stamps displayed in the EHR's CDS panel.
Education: A trainee uses iatroX Brainstorm to structure their thoughts on a complex case, then verifies each step by clicking through to the cited guideline passages.

Risks & mitigations

Out-of-date content: The system must perform nightly rebuilds of its knowledge base and clearly display the source document's version and last-updated date in the user interface.

Hallucinations/over-reach: The architecture must enforce strict source-only summarisation, provide mandatory citations, and be programmed to abstain from answering if the query falls outside its trusted corpus.

Regulatory drift: Developers and providers must continuously track MHRA notices and learnings from the AI Airlock, and re-validate their systems after any significant model or corpus update.

Implementation checklist (for PCNs/Trusts & vendors)

Confirm you have the rights to ingest and process all corpus materials (NICE, Trust documents).
Choose a tool (iatroX for reference/education, SmartGuideline for in-encounter retrieval) and demand DTAC-style evidence packs where applicable.
Run a 6–8-week pilot, measuring faithfulness, guideline concordance, and keeping a full audit trail.
If the tool is intended to impact patient-specific decisions, prepare an AIaMD regulatory plan and consider the MHRA AI Airlock pathway where appropriate.

Conclusion & call-to-action

For clinical guideline retrieval, RAG with transparent citations is the most defensible and trustworthy architecture available today. Its safety and effectiveness are massively enhanced when paired with machine-readable sources like the WHO SMART Guidelines, version-aware pipelines, and robust UK governance.

The next step for healthcare providers is to begin piloting these citation-first RAG tools. Shortlist platforms like iatroX for educational and reference use cases and Automedica SmartGuideline for in-encounter retrieval. Plan a small, controlled pilot and measure faithfulness, guideline concordance, and time-to-answer before scaling. By tracking updates from the MHRA AI Airlock, the UK healthcare community can stay aligned with the regulatory frontier and safely embrace the future of AI-powered guidance.