ToxTwin API — Validation Tests and Usage Guide

ToxTwin Series — Article 3/3. See also: ToxGNN-V1 Pipeline · Report RPT-2026-001

This document constitutes the technical addendum to the ToxTwin v1.0 pipeline documentation. It describes the REST API interface (FastAPI, OpenAPI 3.1), validation results on two reference pharmaceutical molecules, the structured JSON response format, and identified technical debt.

Swagger interface and available endpoints

The ToxTwin API exposes interactive Swagger UI documentation automatically generated by FastAPI, accessible at http://localhost:8000/docs. Four endpoints are available: GET /health (API status, loaded model, device, FP training set, calibration), POST /v1/score-toxicity (compound scoring via JSON {query: string}), GET /v1/score-toxicity (scoring via URL parameter), POST /v1/score-toxicity/batch (batch scoring up to 50 molecules in parallel).

Eight Pydantic schemas structure the exchanges: ScoreRequest, ScoreResponse, ResolvedMolecule, ToxPredictions, ExperimentalData, ApplicabilityDomain, BatchRequest, HTTPValidationError.

Startup and prerequisites

Conda ml environment (Python 3.11, PyTorch 2.12, PyG 2.7), trained model toxgnn_v1_finetune.pt, calibration temperatures JSON, accessible Silver Delta Lake. At startup, the API sequentially loads the model (~2 s), temperatures (instant) and 2,000 training set Morgan FPs via Spark (~8 s).

Test 1: aspirin (acetylsalicylic acid)

Query by DCI name, PubChem PUG-REST resolution → CID 2244, canonical SMILES CC(=O)Oc1ccccc1C(=O)O, MW 180.16 g/mol. Inference time: 2,017 ms. Highest signal: NR-ER 0.350 (oestrogen receptor, expected for an NSAID), followed by NR-AhR 0.191 and SR-ARE 0.168. Silver data: oral rat LD50 200 mg/kg, tox_score 4/5, target organ liver. Applicability domain: Tanimoto 0.594 (in domain). Predictive profile consistent with known pharmacology.

Test 2: ibuprofen (direct SMILES)

Direct SMILES submission (CC(C)Cc1ccc(cc1)C(C)C(=O)O), RDKit resolution without network call, ~430 ms gain. Highest signal: NR-PPAR-gamma 0.383 — documented PPAR-γ activation (Lehmann et al., 1997, Nature Medicine). ToxGNN-V1 captured this structure-activity relationship without explicit supervision. NR-ER 0.318 and NR-ER-LBD 0.264 indicate moderate endocrine disruption, consistent with the NSAID class. Oral rat LD50: 636 mg/kg (less acutely toxic than aspirin), tox_score 2/5. Tanimoto: 0.600 (in domain).

Aspirin vs ibuprofen comparison

The NR-PPAR-gamma differential between aspirin (0.038) and ibuprofen (0.383) constitutes a non-trivial result: it corresponds to a documented pharmacological difference emerging from representations learned by the pre-trained GNN, without explicit supervision. This type of signal validates the model’s ability to capture structure-activity relationships beyond simple correlation with training labels. Ibuprofen shows higher mitochondrial stress (SR-MMP 0.119) and endocrine disruption (NR-ER-LBD 0.264), while aspirin dominates on NR-ER (0.350) and acute tox_score (4/5 vs 2/5). Both compounds are within the applicability domain (Tanimoto > 0.59).

ScoreResponse format

The JSON response contains 8 top-level fields: query, resolved (ResolvedMolecule), predictions (ToxPredictions with 12 scores + uncertainties + mean_uncertainty), experimental_data (LD50, tox_score, severity, Ames, target_organs), applicability_domain (max_tanimoto_similarity, is_in_domain, warning), calibration_warning (boolean if ECE > 0.10), inference_time_ms, model_version.

Score interpretation grid: < 0.10 (very low), 0.10–0.25 (low, monitoring), 0.25–0.50 (moderate, investigation recommended), > 0.50 (high, priority in vitro studies). MC Dropout uncertainty grid: < 0.05 (reliable), 0.05–0.15 (moderate), > 0.15 (high uncertainty).

Known technical debt

ECE calibration 0.149 (FAIL > 0.10) — root cause pos_weight BCE, fix planned via Isotonic Regression (v1.1). Inference time 1,500–2,000 ms on CPU — GPU RTX 5080 will reduce to < 200 ms (v1.1). NR-AR below RF baseline (0.659 vs 0.750) — ChEMBL data enrichment recommended. Applicability domain: 38.2% out-of-domain on scaffold split (by design, k-NN mean similarity to implement in v1.2).

Read the document

↓ Download PDF

Key takeaways

REST API FastAPI (OAS 3.1) with 4 endpoints: health, score-toxicity (POST/GET), score-toxicity/batch — interactive Swagger UI documentation.
Aspirin test: NR-ER 0.350, LD50 200 mg/kg, tox_score 4/5, Tanimoto 0.594 (in domain). Inference 2,017 ms via PubChem DCI resolution.
Ibuprofen test: NR-PPAR-gamma 0.383 — documented PPAR-γ activation (Lehmann et al. 1997) captured without explicit supervision by the GNN.
NR-PPAR-gamma differential aspirin (0.038) vs ibuprofen (0.383): non-trivial result validating the model's ability to capture structure-activity relationships.
8 documented Pydantic schemas: ScoreRequest, ScoreResponse, ResolvedMolecule, ToxPredictions, ExperimentalData, ApplicabilityDomain, BatchRequest.
Identified technical debt: ECE calibration 0.149 (FAIL), CPU inference 1,500–2,000 ms (GPU → <200 ms), NR-AR underperforming (0.659 vs RF 0.750).