GPTs and LLMs in the context of Life Sciences and Clinical Trials

Scope and co-authorship

Conference paper co-authored with Mark Lambrecht (SAS), presented at PHUSE 2024 (Paper CM05). In-depth review of generative AI architectures and their applications across the life sciences value chain, from drug discovery to clinical data automation. 16 pages.

Fundamentals of generative AI

Presentation of three core generative architectures: Generative Adversarial Networks (generator-discriminator tandem), Variational Autoencoders (probabilistic latent space encoding), and Transformers (self-attention mechanism, Query/Key/Value matrices). Each architecture’s strengths, limitations, and application sweet spots are compared across life sciences use cases.

Drug discovery and molecular design

SMILES-based digitisation of molecular structures as bridge between chemistry and computation. AI-driven molecular design through GANs and VAEs versus conventional combinatorial chemistry and high-throughput screening. Virtual screening with transformer attention mechanisms. Computational assessment (in silico) as prerequisite to in vitro/in vivo testing. Biologics screening for monoclonal antibody development.

Genomics, proteomics, and in silico screening

Synthetic genomic sequence generation with GANs and VAEs. Transformer-based modelling of gene function, epigenetic factors (DNA methylation, histone modification), and non-coding RNA relationships. Protein structure prediction with AlphaFold and RoseTTAFold. HPC-accelerated binding force evaluation: molecular docking, molecular dynamics simulations, free energy calculations, QM/MM simulations, and Microsoft Quantum Elements.

Precision medicine and predictive analytics

Pharmacogenomics for targeted therapies based on individual genomic and clinical profiles. Predictive analytics using time-series and NLP techniques on longitudinal health data. Multi-omics integration across genome-to-metabolome layers. Human-AI collaboration model where clinical expertise contextualises population-scale computational findings.

Clinical data automation

SAS copilot-like functionality for automated statistical code generation, data cleaning, standardised analysis (CDISC), and automated reporting. Synthetic data generation and augmentation to address data scarcity. Predictive modelling for diagnosis, disease progression forecasting, and treatment plan tailoring. Challenges: data quality/standardisation, security/privacy (HIPAA), interoperability (HL7/HIE), and model interpretability.

Decentralised clinical trials

eCOA integration for standardised patient-reported outcomes. Real-time data capture and monitoring. Enhanced patient engagement through automated reminders. Inclusion and diversity through elimination of geographical constraints. Adaptive trial protocols dynamically adjusted via generative AI. Role of platforms like SAS Viya for real-time analytics and data management.

Trustworthy AI and ethical considerations

Evolution from “Ethical AI” to “Trustworthy AI” through documentation, transparency, and accountability. Regulatory compliance, bias mitigation, explainability techniques for LLMs (attention visualisation, saliency maps, domain-specific fine-tuning, rule-based post-processing). Human oversight as non-negotiable principle — AI augments but never replaces clinical expertise.

Future prospects

Multi-modal generative models, Retrieval Augmented Generation (RAG) for domain knowledge integration, quantum computing for molecular simulation, Edge AI for real-time bedside analytics. International cooperation required for evolving ethical and regulatory frameworks.

Read the document

↓ Download PDF

Key takeaways

Comprehensive review of three generative AI architectures — GANs, VAEs, Transformers — and their distinct roles across the life sciences value chain.
Drug discovery: SMILES digitisation, AI-driven molecular design, virtual screening vs. high-throughput screening, biologics and in silico assessment.
Genomics & proteomics: synthetic sequence generation, epigenetic factor modelling, AlphaFold/RoseTTAFold for protein structure prediction, HPC and quantum computing for binding analysis.
Precision medicine: pharmacogenomics, predictive analytics for proactive care, multi-omics integration, human-AI collaboration for clinical validation.
Clinical data automation: SAS copilot-like functionality for statistical code generation, synthetic data generation, data augmentation, predictive modelling.
Decentralized clinical trials: eCOA integration, adaptive protocols, real-time analytics, inclusion and diversity benefits.
Trustworthy AI: regulatory compliance, bias mitigation, explainability challenges for LLMs, human oversight as non-negotiable principle.