Article — Position paper · ○ Open access

AI in Healthcare: Impressive Progress, Missing Proof

No clinical AI should be scaled without independent health-economic evidence

Jérôme Vetillard · · Doctrinal position · LinkedIn · 3 pages · 1 min read
🇫🇷 Lire en français ↓ Download PDF

AI in healthcare: impressive progress, missing proof

Doctrinal position on the gap between the spectacular technical performance of AI diagnostic systems and the persistent absence of independent health-economic evidence of their systemic benefit.

The MAI-DxO case

Microsoft AI unveiled MAI-DxO, a multi-agent “orchestrator” that solved 85% of 304 complex NEJM clinical cases — four times the score of a physician control group, at lower notional cost. The system operates as a virtual panel of specialists that questions, orders tests, and self-corrects before naming a diagnosis.

What is remarkable

Three advances deserve attention: sequential, cost-aware reasoning that surpasses multiple-choice benchmarks LLMs already dominate; a built-in cost signal where every test carries a CPT price tag, attacking the $100 billion per year over-testing problem in the United States alone; and simultaneous breadth and depth coverage that no single clinician can match.

Methodological biases to identify

The analysis reveals several structural biases: sampling bias (rare, “signal-rich” NEJM cases are not representative of primary-care prevalence), hindsight bias (solved cases retro-fitted into dialogues may leak textual cues), baseline bias (21 off-specialty GPs constitute a fragile yardstick for “superhuman” claims), and model-on-model bias (Microsoft’s own LLM grades its sibling system, with shared incentives and blind spots).

Structural clinical limitations

Three dimensions remain outside the benchmark scope but lie at the heart of clinical practice: text-only semiology with no imaging, auscultation or tactile signs; a post-anamnesis starting point on pre-digested vignettes that bypasses the complexity of real history-taking; and no confrontation with the inconsistency, emotion and non-verbal cues of real patients.

The health-economic question

The fundamental question remains twofold: will AI genuinely outperform — or at best augment — human clinicians and integrate efficiently into the healthcare value chain? And will it reduce system-wide costs and improve outcomes, or become a Trojan horse letting hyperscalers siphon public-health budgets?

Health-economic evaluation must be integrated into every AI-for-health initiative from its inception.

Read the document