What Should a Clinician Check Before Saving an AI-Generated Consultation Note?

Featured image for What Should a Clinician Check Before Saving an AI-Generated Consultation Note?

AI scribes can produce a fluent, well-structured consultation note in seconds. But fluency is not accuracy. A note can be grammatically polished, clinically plausible, and medically dangerous — all at the same time. NHS England's AVT guidance is explicit: healthcare professionals retain full responsibility for the accuracy of clinical records, and outputs must be checked and corrected before being added to patient records.

This checklist is designed for the 60-90 seconds between the AI generating a note and the clinician saving it to the permanent medico-legal record.

Why AI-Generated Notes Still Need Clinical Review

AI scribes process conversation and produce structured output. They do not understand clinical context the way a clinician does. They cannot distinguish between what was said casually and what was clinically significant. They cannot assess whether an examination was genuinely performed or merely mentioned in passing. They cannot judge whether the patient's "yes, I understand" reflected genuine comprehension or polite compliance. They cannot evaluate whether a management plan was the plan the patient agreed to or one the clinician was thinking aloud about before deciding differently.

The note may read professionally. The question is whether it is clinically complete, accurate, and safe to form part of the permanent record that will follow this patient for the rest of their life — informing every future clinical encounter, every prescribing decision, every insurance report, and every medico-legal review.

The 10-Point AI Consultation Note Checklist

1. Identity and context. Correct patient. Correct date. Correct consultation type (face-to-face, telephone, video). Correct clinician name. Chaperone documented if present. Consultation mode recorded. In a busy clinic running back-to-back appointments, AI scribes generate notes rapidly — confirm the note matches the right consultation. This sounds obvious but becomes important when notes are generated for sequential patients within minutes.

2. Presenting complaint. Does the note preserve the patient's actual concern — in their language, reflecting their priority? "I'm worried about this lump" is clinically and medico-legally different from "patient presents with a subcutaneous mass." The AI may reformulate patient language into medical terminology that changes emphasis, loses the patient's actual worry, or implies a clinical assessment that has not yet occurred.

3. Relevant positives and negatives. Are safety-critical negatives recorded? "No chest pain, no shortness of breath, no syncope" matters more than elegant prose. Red flags specifically asked about and found absent should be explicitly documented — their absence from the note does not prove they were assessed. This is one of the most important medico-legal elements of any clinical record. If it is not documented, it is assumed not to have been assessed.

4. Examination findings. Does the note describe an examination that actually occurred? AI scribes may infer examination findings from conversational context. If the clinician said "your chest sounds clear" during conversation, the scribe may generate "chest examination: clear air entry bilaterally, no wheeze, no crackles" — an examination that was never formally performed with a stethoscope. No invented examination. No implied normal if not actually assessed. Fabricated examination findings are a specific and documented risk of AI scribing.

5. Clinical uncertainty. The most critical verification point. "Possible asthma" is medico-legally different from "asthma." "Chest pain, likely musculoskeletal" is different from "angina." "Low mood" is different from "depression." "Suspected UTI, empirical treatment" is different from "confirmed UTI." AI scribes tend toward diagnostic certainty because confident labels sound more professional than hedged language. Verify that suspected, likely, possible, and excluded diagnoses are not blurred into confirmed diagnoses. A confirmed SNOMED code has permanent consequences for disease registers, recalls, insurance, and prescribing.

6. Medication and allergy accuracy. Drug names, doses, frequencies, durations, and routes should match what was actually prescribed — not what the AI inferred from discussion. If a medication was discussed but not prescribed (e.g., "we talked about starting X but decided to wait"), that distinction must be clear. Contraindications relevant to the patient's profile should not be contradicted by the documented plan.

7. Coding and problem list. If SNOMED CT suggestions are generated: Is the code confirmed, suspected, excluded, historical, or family history? Does it create QOF register implications? Could it affect insurance reports, safeguarding flags, employment assessments, DVLA notifications, or future prescribing decisions? A wrong code can have consequences that outlast the consultation by decades — affecting the patient's clinical record, screening invitations, recall schedules, and insurance applications long after the original clinician has moved on.

8. Management plan. Does the documented plan match the actual discussion with the patient? Are investigations, referrals, prescriptions, and follow-up actions accurately recorded? If the AI generated a plan that sounds reasonable but does not reflect what was actually agreed, it is the AI's plan — not the patient's plan. Only the patient's agreed plan belongs in the record.

9. Safety-netting. Is safety-netting specific, time-bound, and presentation-appropriate? "Return if symptoms worsen" is not adequate and would not withstand medico-legal scrutiny. "Return within 48 hours if headache becomes sudden-onset, if you develop neck stiffness, if you experience visual changes, or if you develop a fever" is specific, time-bound, and verifiable. AI scribes may generate generic safety-netting that lacks the specificity the clinical presentation demands — potentially missing red flags that are specific to the differential being considered.

10. Follow-up and responsibility. Who is doing what, by when? Is the patient waiting for results, a referral, a phone call? What happens if symptoms change before follow-up? Is responsibility clearly assigned? Ambiguous follow-up is one of the most common contributors to clinical safety incidents — and one of the easiest things for an AI scribe to get vaguely right but specifically wrong.

Common Errors to Look For

Over-polished but vague notes. Hallucinated examination findings. Missing red flags. False diagnostic certainty. Incorrect SNOMED codes. Missing safeguarding context. Missing patient preference documentation. Invented shared decision-making language ("risks and benefits discussed and patient consented" when that conversation did not occur in that form). Missing "no safeguarding concerns" where a safeguarding screen was conducted.

When to Use iatroX Before Saving

"What red flags should I safety-net for?" "What does NICE CKS suggest for initial management?" "Which calculator applies here?" "What should be included in this referral?" "Can I save this learning as CPD?"

Before the note becomes part of the permanent record, check the clinical question. Ask iatroX for cited guidance, calculators, and CPD-ready learning →

Share this insight