Exploration · ○ Open access

Predictive Toxicological Analysis Report RPT-2026-001

SMILES → Scoring → Interpretation workflow on an out-of-domain candidate

Jérôme Vetillard · · Twingital Institute · 8 pages · 2 min read
🇫🇷 Lire en français ↓ Download PDF

ToxTwin Series — Article 2/3. See also: ToxGNN-V1 Pipeline · API Tests & Guide

This document illustrates the complete ToxTwin workflow for predictive toxicological analysis of a drug candidate from its SMILES representation. It serves as a template for routine analyses and demonstrates the system’s behaviour when faced with a structurally novel compound.

Analysis workflow: from structure to score

SMILES generation and validation

The SMILES (Simplified Molecular Input Line Entry System) is the standardised textual representation of chemical structure. Before submission to ToxTwin, validity is verified via RDKit (MolFromSmiles() → canonicalisation → 2D/3D). Complementary tools include ChemDraw (PerkinElmer), MarvinSketch (ChemAxon), PubChem Sketcher, JSME/Ketcher for web applications.

API submission

Three access modes: POST curl request, Swagger UI interface (http://localhost:8000/docs), or programmatic Python integration. Candidate RPT-2026-001 is submitted with a SMILES containing a [F+] motif (cationic fluorine) — a potential alkylating agent, rare in approved drugs.

Molecular profile of candidate RPT-2026-001

The compound is not referenced in PubChem, ChEMBL or Tox21. No experimental data in the Silver database: no LD50, no organ profile, no Ames test available. This is precisely ToxTwin’s primary use case: scoring before any experimental testing.

Notable structural features: [F+] (potential alkylating agent, expected SR-p53 signal), piperidine ring (oxidative metabolism, N-oxide formation), pyridine ring (possible CYP450 inhibitor), hydroxyl group (glucuronide conjugation), amine (hERG interaction risk if lipophilic). Inference time: 582.8 ms in direct SMILES resolution.

Applicability domain and uncertainty

Maximum Tanimoto similarity: 0.153 (threshold: 0.30). Out-of-domain compound. The closest molecule in the training set shares only ~15% of circular substructures (Morgan FP, radius 2). Causes: [F+] under-represented in Tox21/ChEMBL, unusual pharmacophoric combination (piperidine + pyridine + F+), absence from public databases.

MC Dropout uncertainty (20 stochastic passes) shows elevated uncertainties (> 5%) on the most active endpoints: SR-p53 (± 6.2%), SR-ARE (± 6.0%), NR-AR (± 7.1%) — expected and coherent behaviour for an out-of-domain compound.

ToxGNN-V1 results: multi-endpoint scoring

The two priority signals are SR-p53 at 32.6% (genotoxicity — p53 is the genome guardian, its activation indicates potential genotoxic stress, consistent with the electrophilic [F+]) and SR-ARE at 25.5% (oxidative stress — Nrf2/KEAP1 pathway activation, consistent with [F+]). Secondary signals (NR-AhR 15.4%, SR-HSE 13.1%, NR-AR 13.0%) remain in the low zone. The remaining 7 endpoints show scores < 10% with low uncertainties.

Priority 1: Ames test (OECD 471) for bacterial mutagenicity, Comet test (OECD 489) for DNA strand breaks, ROS/GSH test for oxidative stress. Priority 2: In vitro micronuclei (chromosomal aberrations), rat LD50 (OECD 420). Priority 3: hERG inhibition by patch-clamp (potential cardiotoxicity linked to amine + nitrogen cycle), CYP450 panel (metabolic interactions).

Verdict

MODERATE risk profile on two genotoxic pathways (SR-p53 32.6%, SR-ARE 25.5%), consistent with the [F+] group. Out-of-domain compound (Tanimoto 0.153) — predictions are extrapolations. Progression towards regulatory studies imperatively requires experimental validation of genotoxic signals (Ames + Comet) before any development decision. This report constitutes a decision support tool and does not substitute for qualified toxicologist judgement or mandatory regulatory studies (ICH S2, S7A, S7B).

Read the document