How Computer-Adaptive Testing Works (and Why Most Medical Exams Are Not Adaptive)

Featured image for How Computer-Adaptive Testing Works (and Why Most Medical Exams Are Not Adaptive)

A computer-adaptive test (CAT) chooses each question based on how you have answered so far, targeting items near your current ability so it can measure you precisely with fewer questions. It is a genuinely different machine from the fixed-form exams most doctors sit, and understanding it explains both why the AMC CAT feels the way it does and why UK royal colleges have mostly kept their exams linear. It also clears up a common confusion: an adaptive test and adaptive learning are not the same thing. Here is how it all fits together.

Key takeaways

  • A CAT updates an ability estimate after each answer and picks the next item to be maximally informative.
  • Item Response Theory underpins it: items have a calibrated difficulty, and your ability sits on the same scale.
  • The AMC CAT is the main adaptive exam most IMGs will sit; most UK medical exams are linear.
  • UK colleges stay linear for reasons of standard-setting, item security, equating, and defensibility.
  • An adaptive test measures your ability; adaptive learning improves it. Same family, different goal.

What a computer-adaptive test is

The core idea is a running estimate that sharpens as you go. A CAT starts with little information about you, often opening with a moderate item, then after each answer it updates its estimate of your ability and selects the next question to be as informative as possible about where you sit. Answer well and the next item is typically harder; answer poorly and it is easier, not to reward or punish you but to zero in on your level efficiently. Because each item is chosen to reduce uncertainty, a CAT can reach a reliable measurement in far fewer questions than a fixed test, which is why the AMC CAT settles your ability in 150 items where a linear exam might need many more.

The psychometrics in plain English

Underneath sits Item Response Theory (IRT). In IRT, every question has been calibrated for its difficulty, and for how well it distinguishes stronger candidates from weaker ones (its discrimination), and your ability is expressed on the same scale as that difficulty. The test maintains a current estimate of your ability with a margin of uncertainty around it, called the standard error, and that margin shrinks as you answer more items targeted near your level. When the estimate is precise enough, or the item count is reached, the test stops. The reason a CAT feels relentless is that it keeps handing you items close to your ability, where you get roughly half right, because those are the most informative.

Real adaptive exams

Adaptive delivery is well established in some assessments and absent in others. The best-known example in healthcare is the NCLEX nursing licensing exam, which is variable-length and adaptive, ending when it has measured the candidate with enough confidence. For doctors, the flagship is the AMC CAT for Australian registration, a genuine question-by-question CAT. Some exams are section-adaptive rather than item-adaptive, adjusting between sections rather than after each question, as the GRE does. The point is that true adaptivity is a deliberate design choice with real infrastructure behind it, not a default.

Why most UK medical exams are linear

UK royal college exams have mostly stayed linear, and the reasons are practical rather than accidental:

  • Standard-setting culture. UK exams lean on the Angoff method, where judges estimate how a just-competent candidate would perform on each item, which fits fixed forms cleanly.
  • Item exposure and security. Adaptive testing needs a large, continuously refreshed pool of calibrated items, and smaller colleges with smaller item banks face real exposure and security risks in serving items adaptively.
  • Equating tradition. Fixed forms with anchor items let colleges carry a stable standard across sittings using well-understood equating, which they trust.
  • Cohort size. Calibrating items for adaptive use reliably needs large numbers of responses, and many UK postgraduate exams have modest cohorts.
  • Defensibility. Fixed, reviewable forms are easier to defend on appeal, which matters for high-stakes professional exams.

Notably, the psychometric family is converging even where delivery stays linear: the MRCGP AKT now uses IRT as its primary analysis while remaining a linear exam, as we cover in what the AKT's move to IRT means. IRT scoring and adaptive delivery are separable, and a linear exam can be IRT-scored without being adaptive, as the Canadian MCCQE1 also is.

Adaptive tests versus adaptive learning

This is the distinction that trips people up, and it matters. An adaptive test and an adaptive learning tool share the same IRT machinery but pursue opposite goals. An adaptive test selects items to measure your ability as precisely as possible with as few questions as possible; its job is to score you and then stop. An adaptive learning tool selects items to improve your ability, steering you toward your weak areas and scheduling retrieval so you retain what you learn; its job is to teach you and keep going. One is trying to find out what you know; the other is trying to change it. Confusing the two leads people to expect a question bank to "adapt like the real exam", when in fact a good learning engine is optimising for something more useful to a candidate: your improvement, not your measurement.

What this means for how you practise

Two practical conclusions follow. First, if you are sitting a linear exam, which is most of them, the adaptivity you want belongs in your practice, not in the exam, so use a tool that targets your weaknesses and spaces your revision rather than one that merely mimics a test interface. Second, if you are sitting the AMC CAT, practising in an adaptive engine is the one form of preparation that reproduces the experience of difficulty-responsive item selection, which we cover separately. Either way, iatroX is built as an adaptive learning engine that targets the concepts you keep missing and schedules retention, with free sample questions to try at iatroX.

Frequently asked questions

What is a computer-adaptive test? An exam that updates an estimate of your ability after each answer and selects the next question to be maximally informative, so it can measure you precisely with fewer items. The AMC CAT and NCLEX are examples.

Which medical exams are adaptive? Very few. The AMC CAT for Australian registration is the main one doctors sit. Most UK exams, the USMLE Step 1 and Step 2 CK, and the Canadian MCCQE1 are linear, fixed-form exams.

Why are UK medical exams not adaptive? Because of Angoff standard-setting, the item-security demands of adaptive pools, a trusted equating tradition, modest cohort sizes, and the need to defend high-stakes exams on appeal. Fixed forms suit these constraints.

Is IRT the same as adaptive testing? No. IRT is a way of calibrating items and scoring ability, and it can be used in a linear exam without adaptivity. The MRCGP AKT and the MCCQE1 are IRT-informed but linear.

What is the difference between an adaptive test and adaptive learning? An adaptive test selects items to measure your ability efficiently; adaptive learning selects items to improve it by targeting weaknesses and spacing retrieval. Same psychometric family, opposite goals.

Share this insight