Article — Position paper · ○ Open access

82 percent. AI regulation in health insurance and the variable it refuses to measure

From a Stanford study published in Health Affairs (January 2026), a system self-stabilized by unobservability

Jérôme Vetillard · · Twingital Institute · 8 pages · 7 min read
🇫🇷 Lire en français ↓ Download PDF

The figure, in context

The study by Mello, Trotsyuk, Djiberou Mahamadou and Char published in Health Affairs in January 2026, The AI Arms Race in Health Insurance Utilization Review, delivers a figure that regulatory doctrine will have to digest: 81 percent of prior authorization denials issued in Medicare Advantage plans and appealed by beneficiaries are overturned upon review. The 2024 data published by KFF from CMS reporting give 80.7 percent of appeals partially or fully overturned out of 4.1 million denials. The order of magnitude is stable — roughly four appeals out of five result in a partial or total reversal. Of 4.1 million denials, only 11.5 percent are appealed; the Estate of Lokken v. UnitedHealth Group complaint alleges 0.2 percent in the post-acute care segment managed by nH Predict. The two figures do not measure the same thing; they describe the same structure at two resolutions. Out of 4.1 million denials, 3.63 million are never contested. How many would have been overturned if they had been? The system does not measure it, and that is precisely the central argument: this unobservability is not a methodological accident. It is a structural property of the apparatus.

The thesis, and the doctrinal refinement that strengthens it

AI regulation in American health insurance does not operate through guarantee of algorithmic quality. It operates through friction of use. The regulatory compliance of the system is compatible with a high rate of initial decisions unsupported after challenge, as long as the cost of appeal for the beneficiary remains prohibitive. The term calibration suggests a centralized intent — rhetorically stronger and empirically weaker. The defensible position is finer: the system does not need to be designed for non-recourse in order to operate by it. A distributed industry optimization is harder to regulate than a centralized intent, precisely because it has no identifiable author. And a system without an author has no designated regulatory respondent either. The system is structurally equivalent to a calibration: it functions as if calibrated, without it being necessary to prove central intent. This nuance does not weaken the thesis — it displaces what is regulable, from intent to structural property.

Friction: three non-substitutable components

For a beneficiary to move from denial to reversal, a documented chain of obstacles must be cleared. This friction breaks down into three components useful to distinguish, without prejudging their relative weight: administrative friction (procedural delays, forms, levels of appeal, submission formats), informational friction (fragmentation of the clinical record, cost of retrieval, translation into the grammar of utilization review, certification of the care trajectory), cognitive friction (understanding of the right of appeal, capacity to mobilize the prescriber, beneficiary exhaustion, asymmetry of literacy in the face of the denial letter). Reducing one does not mechanically reduce the others. For post-acute care, the appeal timeframe often exceeds the relevant clinical window: the appeal arrives after the need has ceased to exist. On the provider side, the picture is complementary — physicians spend on average 12 hours per week on PA according to the AMA; providers spend around 19.7 billion dollars per year litigating denials.

Unobservability as a political variable

An observable error rate is regulable. A non-observable error rate is not. The current system does not measure how many uncontested denials would have been reversible after challenge, and it does not measure it by construction: non-contestation is precisely the event that prevents measurement. A quality standard can apply only to what allows itself to be observed; it therefore applies only to the 11.5 percent of contested denials, and within that, to a subset biased by selection toward cases where contestation appeared viable. The regulatorily measurable share concentrates on marginal cases; reality escapes the instrument applied to it. The regulatory defect is not that the model denies too much. The defect is that the system does not measure the cost required to transform a denial into a contestable object. The question is not only who decides — it is who holds the architecture of evidence at the moment of denial.

Ex post friction is not the whole apparatus

The other half of the system is ex ante deterrence: anticipated non-demand of care. What former executives of the prior authorization industry call the sentinel effect designates the physician who, having internalized the statistical cost of a request likely to be denied, stops formulating it; the beneficiary who, having internalized the history of denials in their care category, no longer asks the prescriber for the intervention. None of these non-demands appears in PA statistics, because there exists no accounting category for care never requested. The system is not only a system of denials — it modifies the distribution of demand. The ratio of initial requests to potential requests is never published, because it is never computed. At each layer of the apparatus, the politically decisive segment is the one that leaves no trace.

The political economy of non-recourse, under constraints

The simplified yield equation, Profit ≈ f(denials × (1 − appeal_rate × reversal_rate)), implicitly assumed that maximizing denials is always optimal. It is not. Several constraints weigh on the equilibrium: Medical Loss Ratio (the ACA rule requires that a minimum share of premiums be returned in care), administrative cost, legal risk (class actions like Lokken), reputational risk and CMS Star Ratings. The relevant equation is marginal and under constraints. At this equilibrium, non-recourse remains a central stabilizing factor: an uncontested denial is, all other things equal, the economically most favorable denial — it generates neither jurisprudence nor precedent, and does not expose the payer to reputational risk. It is this equilibrium that reproduces itself, without any actor having to design it as such.

The pattern: Lokken, PxDx, EviCore — and a regulation that does not touch the equilibrium

Three cases illustrate the motif, with differing evidentiary status. Estate of Lokken v. UnitedHealth Group: the complaint alleges a 90 percent error rate and 0.2 percent appeal rate on nH Predict (naviHealth, 2.5 billion dollars in 2020); the Senate PSI investigation (October 2024) — a publicly established finding distinct from the plaintiffs’ allegation — documented that UnitedHealth’s denial rate in post-acute care rose from 8.7 percent to 22.7 percent between 2019 and 2022, after the deployment of nH Predict. Kisting-Leung v. Cigna: ProPublica (March 2023) established that PxDx had rejected more than 300,000 requests in two months at 1.2 seconds per decision. EviCore by Evernorth: public commercial material promising 3 dollars saved for 1 dollar spent (primary source); the internal parametric threshold the dial falls under elements reported by investigation on testimonial sources. The motif is shared: an algorithmic apparatus optimized for a denial volume exceeding human review capacity, backed by an appeal mechanism whose friction guarantees that it will be marginally used. The regulatory lever that matters is not the quality of the initial decision: a downstream human review does not change the equilibrium as long as the review is mobilized by only 11.5 percent of those denied. Worse: the requirement of specific reasoning has two opposing effects — downward pressure on denial volume, but symmetric increase in the cost of refutation for those without documentary resources. If the net effect tips toward the second side, the regulation aggravates the friction it intends to correct.

The concept: administrative abandonment, and what should be regulated

Medical abandonment designates the situation in which the patient is left without follow-up by default of a chain of care. The 82 percent, the 11.5 percent, the 0.2 percent describe an administrative abandonment: the situation in which the beneficiary is left without recourse by default of a chain of contestation. Not the same phenomenon, but the same structure: the system functions because the user cannot negotiate its exits. Two adjacent concepts: perimeter bias (regulation evaluates what it names, not what produces the effect — the quality of the AI is regulated, the quality of the friction is not); governability debt (a system can be locally compliant and globally ungovernable). What should be regulated? The cost of evidentiary reconstruction — the informational component of friction. Concretely: effective portability of the beneficiary’s clinical record, real-time access to the data necessary for contestation, automatic pre-instruction of an appeal by the same system that produced the denial, transparency on the parametric thresholds of the denial model, non-aggregated publication of denial and reversal rates by clinical cohort. The FHIR Prior Authorization API planned by CMS for 2027 goes in the right direction, on the condition that its scope cover contestation and not only submission. None of these measures acts on the quality of the model — all act on the informational mechanics of evidentiary reconstruction. The 82 percent is not a statistic of failure. It is the numerical expression of a zone outside the metric, and of a system self-stabilized by the unobservability it maintains. What remains to be known is who pays for it to reproduce itself. The arithmetic answer is known.

Primary sources

Mello M.M., Trotsyuk A.A., Djiberou Mahamadou A.J., Char D., The AI Arms Race in Health Insurance Utilization Review, Health Affairs 2026;45(1):6-13. DOI: 10.1377/hlthaff.2025.00897. — Fuglesten Biniek J. et al., Medicare Advantage Insurers Made Nearly 53 Million Prior Authorization Determinations in 2024, KFF, January 2026. — Estate of Gene B. Lokken et al. v. UnitedHealth Group, Inc., 0:23-cv-03514 (D. Minn.); discovery order of March 9, 2026, 2026 WL 658883. — Kisting-Leung v. Cigna Corporation, 2:23-cv-01477 (E.D. Cal.). — ProPublica & Capitol Forum, Inside the Company Helping America’s Biggest Health Insurers Deny Coverage, October 2024 (EviCore). — U.S. Senate Permanent Subcommittee on Investigations, Refusal of Recovery, October 2024. — CMS Interoperability and Prior Authorization Final Rule, CMS-0057-F. — CMMI, WISeR Model Operational Guide, January 1, 2026. — California SB 1120 (effective January 1, 2025). — Sunstein C.R., Sludge: What Stops Us From Getting Things Done and What to Do About It, MIT Press, 2021.

Read the document