Evidence Is a Conditional Promise · Twingital Institute

Why “Is This Evidence Valid?” Is the Wrong Question

The habit is to treat a piece of evidence the way one treats a bridge or a theorem: sound or not, true or not, certified once and for good. The habit is ill-posed. A perfectly rigorous trial becomes noise outside its target population. Two impeccable studies on the same intervention reach incompatible recommendations without either being in error. A perfectly calibrated model still produces a dangerous decision when its threshold ignores the asymmetric cost of a false negative. None of these is a paradox awaiting repair, because every available repair, internal validity, external validity, stratification, calibration, only works by reintroducing a decision, a loss, or a domain. The conclusion is structural and not anecdotal: validity is not a property an object carries, it is a relation between an evidence source and the use made of it.

The Four Indices a Piece of Evidence Cannot Do Without

Once validity is relational, “valid” becomes an incomplete predicate, like “greater than”. It demands its arguments, and exactly four. A decision D, in Wald’s sense of a rule mapping observation to action, without which the predicate is empty. A loss L to order the consequences of that rule, which is why a calibrated model can be calibrated and wrong. A domain Δ in which the source is deemed reliable, borrowed from the applicability-domain discipline of structure-activity models, because outside Δ a proof is not false, it is off-topic, which is worse since off-topic evidence still looks like evidence. A time T, because evidence validated today is not validated forever. The structure (D, L, Δ, T) is minimal in a precise sense: remove any one and a contradiction returns. The burden is reversed accordingly. It is not for the theory to prove no fifth index exists, it is for whoever proposes one to prove it irreducible to the four.

A Validation Theory That Submits to Its Own Criteria

The obvious objection is relativism: if validity is only ever a relation to a use, has every notion of quality dissolved. The answer is not to smuggle an absolute back in, but to demand that a validation theory be coherent, composable, transferable, and refutable, and then to pass the theory through its own sieve. It does not self-refute, partial validations compose along a chain of decisions, the transfer conditions are declared rather than hoped, and a single use-independent validity would bring the whole edifice down. A theory that survives its own criteria is not thereby true; it is admissible, which is already more than “validity in itself” has ever offered. This relational reading does not compete with GRADE, CONSORT, or TRIPOD. It is the common grammar that explains why each, in its own corner, ends up declaring a use, a criterion, a limit, and a window.

What a Generator Renders Inferable, and What It Cannot Invent

The generative turn makes one question burning: can a synthetic population, a digital twin, a simulated cohort produce content absent from its data. The naive answer is no, and it is right for the wrong reasons, which makes it dangerous. The precise statement is a conservation law: no generator renders identifiable a content absent from its constraints. What a latent representation exhibits is not new content, it is a content already identifiable in principle from the constraints, which the generator renders inferable, that is, operational. The right verb is not to create but to render inferable. This forbids the sales promise of generating patients where there are no data, and at the same time grants the generator a defensible value: rendering a latent structure computable, at a cost incommensurable with that of a recruitment.

Why a Synthetic Cohort Can Be Substitutable for Deciding but Not for Learning

A piece of evidence does two jobs, not one: it helps choose the act, and it helps judge whether it is worth knowing more before choosing. The second job is what value-of-information analysis quantifies, through EVPI and EVSI. The consequence for synthetic cohorts is sharp and often ignored. A cohort can preserve the optimal decision under a given loss while deforming the uncertainty structure that grounds the value of future research. It then leads to the right choice today while wrongly suggesting that an additional study is useless, or indispensable. Hence two levels: decisional substitutability, same optimal rule, and informational substitutability, same value of the information liable to revise that rule. Evidence that satisfies only the first remains intrinsically myopic.

Substitutability Is a Number With a Date, Not a Certificate

Substitutability is not a Boolean state but a graded measure σ between zero and one, defined relative to a decision δ, a loss L, and a declared domain, and naturally formalized as a normalized decision regret: one minus the expected excess loss of acting on the synthetic source rather than the real one, scaled by a reference. A value of one means the substitution induces no average decisional loss; as the cost of divergences rises, σ falls toward zero. The figure measures the cost of replacing one source by another for a stated decision, not the statistical proximity of two distributions. To say a cohort is “substitutable at 0.82” means nothing without its decision, its loss vector, its domain, its uncertainty interval, and its date. That is precisely what makes substitutability governable: a binary state is not negotiated, a dated degree can be thresholded and audited.

Health Is the Densest Case, Not the Perimeter

Nothing in (D, L, Δ, T) is proper to medicine. A finance risk model calibrated on one market regime expires when the regime turns. An aeronautical certification simulation holds only inside its declared flight envelope. A cybersecurity detector has no guarantee outside the attack distribution it was trained on, and the adversary lives precisely in that extrapolation. These transfers are conjectures of scope that the theory demands be tested domain by domain, not decorative claims. Health offers only the densest version, because the stakes are vital, the subpopulations numerous, and the loss trade-offs the most morally charged. A patient profile is a vector over several hundred tabular variables, which is why one does not reason about a carrier of a BRAF V600E mutation by applying the intuition of a word model, and why a toxicity terrain such as ToxTwin is an implementation case, not a general proof.

What This Note Does Not Claim, and What It Names Instead

Outside its domain, a piece of evidence is not false, it is expired, and the distinction matters: false evidence was poorly established, expired evidence was well established and has ceased to apply. A proof that can expire is a process, with a start of validity, a window, and a condition of end, which one does not certify once and for all but commits to, monitors, and renews. From there the order of concepts is the thought itself. Validation belongs to none of the terms present: it is an emergent property of a relation between source, decision, and context, indexed by time. It follows that evidence states a guarded implication, a conditional promise, and that the use contract is merely the social name of that promise when it must be kept between parties. Regulation already formalizes such promises, through the FDA’s predetermined change control plans and the intended-purpose logic of the European AI regulation, without yet having the theory that says so. This note proposes that theory. The full argument, with its structure, its oncological case, and its references, is available below.

Read the document

↓ Download PDF

Key takeaways

Thesis: validity is not in the evidence, it is in the relation between a piece of evidence and the use made of it. Three ordinary contradictions show that no non-relational notion of validity can hold, because every repair smuggles back a decision, a loss, or a domain.
A minimal structure (D, L, Δ, T): a decision to serve, a loss to order consequences, a domain of reliability, a window of validity. Minimal is an opposable constraint, not a confession: any fifth component must prove its irreducibility to the four.
Loss is contextual before it is vectorial. Clinical trade-offs weigh non-commensurable dimensions whose weights depend on the patient and the care plan; silent scalarization is governance happening without its own knowledge.
A generator renders inferable, it does not create. No generator makes identifiable a content absent from its constraints; the right verb turns implicit, already-identifiable structure into something operational for a decision.
Two levels of substitutability: decisional (same optimal action) and informational (same value of information, EVPI/EVSI). A synthetic cohort can be substitutable for deciding yet not for learning. Substitutability is a dated degree σ, not a certificate.
Evidence outside its domain is not false, it is expired. A proof that can expire is a process, not an object: a conditional promise, of which the use contract is only the social formalization, and the direction regulation has already taken.