ToxTwin V2.3 — User Guide · Twingital Institute

What is ToxTwin?

ToxTwin is a toxicological prediction model that analyses the chemical structure of a molecule to estimate its probability of activity across 14 regulatory biological endpoints. It uses a graph neural network (GNN) trained on high-throughput screening data from the ChEMBL and Tox21 databases. ToxTwin estimates preliminary toxicological risk from structure alone, signals whether the molecule falls within the model’s applicability domain, and generates a synthetic pharmacological interpretation. It does not replace mandatory biological tests (ICH S2, S7B), does not predict in vivo toxicity, and does not provide a legally binding regulatory opinion.

Submitting a molecule

ToxTwin accepts molecules in SMILES notation — the standard textual representation of chemical structure. Canonical SMILES is recommended (ToxTwin applies RDKit canonicalisation upstream). SMILES sources include PubChem, ChEMBL, ChemDraw/MarvinSketch and RDKit. The demo interface is accessible on twingital-ventures.com with a freemium quota of 3 free analyses. Invalid SMILES, molecules exceeding 500 heavy atoms and mixtures (. character) are subject to specific handling documented in the guide.

The 14 endpoints

The 7 nuclear receptor endpoints (NR-AR, NR-AR-LBD, NR-AhR, NR-Aromatase, NR-ER, NR-ER-LBD, NR-PPAR-γ) assess endocrine disruption. The 5 cellular stress response endpoints (SR-ARE, SR-ATAD5, SR-HSE, SR-MMP, SR-p53) signal genomic, mitochondrial or protein toxicity. The 2 pharmacotoxicology endpoints — hERG (potassium channel, ICH S7B) and Ames (Salmonella mutagenicity, ICH S2R1) — have V2.3 performance below the regulatory target and are indicative only.

Reading and interpreting results

Each endpoint returns a score between 0 and 1 representing the calibrated probability of activity. Interpretation thresholds are: low (<0.25), moderate (0.25–0.50), high (0.50–0.75), critical (>0.75). A complete profile is read at three levels: dominant signals (>0.25), structural coherence with functional groups, and applicability domain context. The Aspirin example (low profile across 14/14 endpoints, Tanimoto 0.857, in domain) illustrates the standard reading.

Applicability domain

The applicability domain indicates whether the molecule is structurally similar to the training corpus. ToxTwin uses a composite score based on three complementary signals. Common causes of out-of-domain results include novel scaffolds, unusual functional groups, peptides, polymers and organometallic compounds. An out-of-domain score does not mean toxicity — it means insufficient data for a reliable prediction.

Pharmacological interpretation

The textual interpretation generated by a local LLM provides a narrative synthesis of signals, correlation with functional groups, risk level assessment and regulatory context. It does not constitute a regulatory opinion, an in vivo prediction, or a substitute for the judgement of a qualified toxicologist. It is a starting point for reflection.

Limitations and use cases

ToxTwin is appropriate for early triage, mechanistic hypotheses, experimental prioritisation and comparative monitoring. It is not appropriate as a substitute for regulatory tests, as a component of a marketing authorisation dossier without experimental confirmation, or for out-of-scope classes. Endpoints not covered in V2.3 (DILI, ClinTox, extended cardiotoxicity, phototoxicity, DART, metal complexes) are planned for V3.0.

Frequently asked questions

The guide addresses the most common questions: diagnosing invalid SMILES, interpreting a high hERG score (warning signal justifying a patch-clamp test, not a stop decision), using out-of-domain scores (indicative, a strong signal warrants investigation), deterministic prediction reproducibility, stereochemistry handling, professional API access, and the policy of not storing submitted SMILES.

Read the document

↓ Download PDF

Key takeaways

14 toxicological endpoints detailed: 7 nuclear receptors, 5 cellular stress responses, hERG (ICH S7B) and Ames (ICH S2R1).
Calibrated probability scores [0–1] with 4 risk levels: low (<0.25), moderate (0.25–0.50), high (0.50–0.75), critical (>0.75).
Composite tri-signal applicability domain: Tanimoto similarity, latent k-NN distance, KDE density — calibrated is_in_domain threshold.
Pharmacological interpretation via local LLM: narrative synthesis, functional group correlation, verdict, regulatory context.
Appropriate use cases: early triage, mechanistic hypotheses, experimental prioritisation, comparative monitoring between analogues.
Explicit limitations: does not replace ICH S2/S7B, out of scope for biologics, peptides >50 AA, organometallic compounds.