We Built a Wordle for Doctors: What Thousands of Guesses Reveal About How Clinicians Think

TEMPLATE NOTE (remove before publishing): This piece is written as a complete narrative frame with the data points left as bracketed placeholders in the form [[INSERT: ...]]. Populate each with real aggregate, anonymised iatroX Rounds play data before publishing. Do not publish with placeholders or invented figures. Keep the interpretation honest and proportionate to what the data actually shows.

We built iatroX Rounds, a free daily diagnosis game for doctors and medical students, where you read a clinical case and guess the diagnosis as clues are revealed. After [[INSERT: number of plays or guesses, eg "more than 100,000 guesses across X cases"]], the aggregate, anonymised data shows some genuinely interesting patterns in how clinicians reason under uncertainty. The single most striking finding: [[INSERT: one-sentence headline finding]]. Here is what thousands of guesses reveal.

Key takeaways

The hardest case so far was [[INSERT: hardest case and average guesses]].
The most commonly confused pair of diagnoses was [[INSERT: confused pair]].
Players tended to [[INSERT: converge or scatter]] as more clues were revealed.
Average guesses varied by [[INSERT: specialty, topic or other axis]].
The patterns line up with what clinical reasoning research would predict about [[INSERT: anchoring, look-alikes or illness scripts]].

Which cases were the hardest?

The cases that took the most guesses were [[INSERT: top 3 to 5 hardest cases with average guesses each]]. What these have in common is worth noting: [[INSERT: brief interpretation, eg they share early features with common conditions, or the discriminating clue appears late]]. The easiest cases, by contrast, were [[INSERT: easiest cases]], which tend to have a distinctive early feature that points clearly to the answer. The gap between the two is essentially the gap between conditions with a strong pathognomonic signal and those defined by subtle discrimination.

Which diagnoses got confused with each other?

The most revealing data is in the wrong answers. The diagnoses players most often confused were [[INSERT: top confused pairs, eg "A often guessed when the answer was B"]]. These confusions are not random. They cluster around genuine clinical look-alikes, conditions that share an early picture and diverge only on a specific feature. That is exactly where real diagnostic error tends to happen, which is part of why a game like this is useful: it surfaces the mimics that catch people, in a low-stakes setting. We explore the clinical version of this in our piece on commonly misdiagnosed conditions.

How did guesses change as clues were revealed?

Because each wrong guess reveals another clue, the data shows how reasoning updates with information. Players tended to [[INSERT: converge quickly, or hold an early wrong answer too long]], and the clue that most often unlocked the answer was typically [[INSERT: the kind of clue, eg an investigation result or a specific examination finding]]. When players anchored, it usually showed as [[INSERT: pattern, eg repeating near-identical guesses despite new clues]]. This is anchoring made visible: the tendency to hold an early impression even as the evidence shifts, which is one of the best-documented causes of diagnostic error.

Did reasoning differ by specialty or topic?

Average guesses varied across [[INSERT: axis, eg specialty, body system, or self-reported stage of training]]. The cases that were hardest for one group and easier for another were [[INSERT: examples]], which hints at how exposure shapes recognition: you recognise fastest what you have seen most. [[INSERT: any experience-related pattern, kept cautious given self-selected players]].

What does this suggest about how clinicians think?

Taken together, the patterns fit what clinical reasoning research would predict. Recognition is fast when a case matches a well-formed illness script, and slow or error-prone when conditions share an early picture and diverge on a single feature. Confusions cluster around genuine look-alikes, and anchoring shows up as reluctance to revise an early answer. None of this is surprising in isolation, but seeing it emerge from thousands of real guesses is a vivid demonstration of how pattern recognition and its failure modes actually work. For the underlying skill, see our spot diagnosis guide and our explainer on illness scripts.

A note on methodology

These figures come from aggregate, anonymised play data from iatroX Rounds, not from a controlled study. Players are self-selected, the cases are a curated set rather than a representative sample of clinical practice, and a game is not a substitute for formal research. The patterns are best read as an interesting window onto reasoning, not as evidence about any individual or about clinical performance. We report them in that spirit.

Want to add your own guesses to the next dataset? Play today's iatroX Rounds.

Frequently asked questions

What is iatroX Rounds? A free daily diagnosis game for doctors and medical students. You read a clinical case and guess the diagnosis as clues are revealed, with a shareable grid at the end.

Where does this data come from? Aggregate, anonymised play data from iatroX Rounds. It is a window onto how players reason, not a controlled study, and it should be read with that caveat.

What was the hardest case? [[INSERT: hardest case and average guesses]]. Hard cases tend to share an early picture with common conditions and reveal the discriminating clue late.

What does this tell us about diagnostic error? The diagnoses players confuse cluster around genuine clinical look-alikes, and anchoring shows up as holding an early answer too long. Both are well-documented causes of real diagnostic error.

Is a diagnosis game useful for real clinical skill? As practice, yes. It exercises pattern recognition and surfaces the mimics that catch people, in a low-stakes setting. It is a revision and reasoning aid, not medical advice or a clinical tool.