A relational and emergent theory of computational evidence validation in health, and beyond
The habit is to treat a piece of evidence the way one treats a bridge or a theorem: sound or not, true or not, certified once and for good. The habit is ill-posed. A perfectly rigorous trial becomes noise outside its target population. Two impeccable studies on the same intervention reach incompatible recommendations without either being in error. A perfectly calibrated model still produces a dangerous decision when its threshold ignores the asymmetric cost of a false negative. None of these is a paradox awaiting repair, because every available repair, internal validity, external validity, stratification, calibration, only works by reintroducing a decision, a loss, or a domain. The conclusion is structural and not anecdotal: validity is not a property an object carries, it is a relation between an evidence source and the use made of it.
Once validity is relational, “valid” becomes an incomplete predicate, like “greater than”. It demands its arguments, and exactly four. A decision D, in Wald’s sense of a rule mapping observation to action, without which the predicate is empty. A loss L to order the consequences of that rule, which is why a calibrated model can be calibrated and wrong. A domain Δ in which the source is deemed reliable, borrowed from the applicability-domain discipline of structure-activity models, because outside Δ a proof is not false, it is off-topic, which is worse since off-topic evidence still looks like evidence. A time T, because evidence validated today is not validated forever. The structure (D, L, Δ, T) is minimal in a precise sense: remove any one and a contradiction returns. The burden is reversed accordingly. It is not for the theory to prove no fifth index exists, it is for whoever proposes one to prove it irreducible to the four.
The obvious objection is relativism: if validity is only ever a relation to a use, has every notion of quality dissolved. The answer is not to smuggle an absolute back in, but to demand that a validation theory be coherent, composable, transferable, and refutable, and then to pass the theory through its own sieve. It does not self-refute, partial validations compose along a chain of decisions, the transfer conditions are declared rather than hoped, and a single use-independent validity would bring the whole edifice down. A theory that survives its own criteria is not thereby true; it is admissible, which is already more than “validity in itself” has ever offered. This relational reading does not compete with GRADE, CONSORT, or TRIPOD. It is the common grammar that explains why each, in its own corner, ends up declaring a use, a criterion, a limit, and a window.
The generative turn makes one question burning: can a synthetic population, a digital twin, a simulated cohort produce content absent from its data. The naive answer is no, and it is right for the wrong reasons, which makes it dangerous. The precise statement is a conservation law: no generator renders identifiable a content absent from its constraints. What a latent representation exhibits is not new content, it is a content already identifiable in principle from the constraints, which the generator renders inferable, that is, operational. The right verb is not to create but to render inferable. This forbids the sales promise of generating patients where there are no data, and at the same time grants the generator a defensible value: rendering a latent structure computable, at a cost incommensurable with that of a recruitment.
A piece of evidence does two jobs, not one: it helps choose the act, and it helps judge whether it is worth knowing more before choosing. The second job is what value-of-information analysis quantifies, through EVPI and EVSI. The consequence for synthetic cohorts is sharp and often ignored. A cohort can preserve the optimal decision under a given loss while deforming the uncertainty structure that grounds the value of future research. It then leads to the right choice today while wrongly suggesting that an additional study is useless, or indispensable. Hence two levels: decisional substitutability, same optimal rule, and informational substitutability, same value of the information liable to revise that rule. Evidence that satisfies only the first remains intrinsically myopic.
Substitutability is not a Boolean state but a graded measure σ between zero and one, defined relative to a decision δ, a loss L, and a declared domain, and naturally formalized as a normalized decision regret: one minus the expected excess loss of acting on the synthetic source rather than the real one, scaled by a reference. A value of one means the substitution induces no average decisional loss; as the cost of divergences rises, σ falls toward zero. The figure measures the cost of replacing one source by another for a stated decision, not the statistical proximity of two distributions. To say a cohort is “substitutable at 0.82” means nothing without its decision, its loss vector, its domain, its uncertainty interval, and its date. That is precisely what makes substitutability governable: a binary state is not negotiated, a dated degree can be thresholded and audited.
Nothing in (D, L, Δ, T) is proper to medicine. A finance risk model calibrated on one market regime expires when the regime turns. An aeronautical certification simulation holds only inside its declared flight envelope. A cybersecurity detector has no guarantee outside the attack distribution it was trained on, and the adversary lives precisely in that extrapolation. These transfers are conjectures of scope that the theory demands be tested domain by domain, not decorative claims. Health offers only the densest version, because the stakes are vital, the subpopulations numerous, and the loss trade-offs the most morally charged. A patient profile is a vector over several hundred tabular variables, which is why one does not reason about a carrier of a BRAF V600E mutation by applying the intuition of a word model, and why a toxicity terrain such as ToxTwin is an implementation case, not a general proof.
Outside its domain, a piece of evidence is not false, it is expired, and the distinction matters: false evidence was poorly established, expired evidence was well established and has ceased to apply. A proof that can expire is a process, with a start of validity, a window, and a condition of end, which one does not certify once and for all but commits to, monitors, and renews. From there the order of concepts is the thought itself. Validation belongs to none of the terms present: it is an emergent property of a relation between source, decision, and context, indexed by time. It follows that evidence states a guarded implication, a conditional promise, and that the use contract is merely the social name of that promise when it must be kept between parties. Regulation already formalizes such promises, through the FDA’s predetermined change control plans and the intended-purpose logic of the European AI regulation, without yet having the theory that says so. This note proposes that theory. The full argument, with its structure, its oncological case, and its references, is available below.
Doctrinal notes and explorations on AI in regulated systems. Once or twice a month. One-click unsubscribe.