Conference paper · PHUSE 2024 · Paper CM05 · Life Sciences & AI

GPTs and LLMs in the context of Life Sciences and Clinical Trials

Jérôme Vétillard (Microsoft) & Mark Lambrecht (SAS) · 16 pages · English

Generative AI — GANs, VAEs and transformers — is reshaping the life sciences value chain, from molecular design to clinical data automation. But responsible deployment demands explainability, regulatory compliance, and human oversight at every stage.

Summary

This paper, presented at the PHUSE 2024 conference, provides a comprehensive review of generative AI technologies and their applications across the life sciences value chain. The authors examine three foundational architectures — Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs) and Transformers — and map their capabilities to concrete use cases in drug discovery, genomics & proteomics, personalised medicine, clinical data automation, and decentralised clinical trials.

The paper also addresses the critical challenges of data quality, ethical considerations, model explainability, and regulatory compliance. It explores emerging paradigms including Retrieval Augmented Generation (RAG), quantum computing for molecular simulation, and edge AI for point-of-care decision support.

Contents

GANs, VAEs, Transformers — architecture, training principles, and comparative strengths. From adversarial training to self-attention mechanisms.

Drug discovery (SMILES, molecular design, virtual screening), genomics & proteomics (sequence generation, AlphaFold, RoseTTAFold), in silico screening (molecular docking, MD simulations, QM/MM).

Precision profiling through multi-omics integration, predictive care trajectories, pharmacogenomics, and human-AI collaboration for clinical validation.

Synthetic data generation, data augmentation, predictive modelling, and the SAS copilot-like functionality for automated statistical code generation on SAS Viya.

Remote participation through ECOA, real-time monitoring, inclusion & diversity, and the role of generative AI in adaptive trial design.

Trustworthy AI principles, LLM explainability techniques, RAG architecture, quantum computing perspectives, and edge AI for bedside decision-making.

Context and significance

Written while at Microsoft Health & Life Sciences, this paper reflects the author's dual expertise in biotechnology (PhD ENS Ulm / AgroParisTech) and enterprise technology deployment across regulated industries. The collaboration with Mark Lambrecht (SAS) bridges the gap between AI infrastructure and biostatistical practice — a tension central to the responsible industrialisation of AI in clinical research.

The paper anticipates several themes later developed in the author's work on TweenMe (digital twin generation), PREDICARE (territorial predictive medicine), and the CINN architecture (Clinically-Informed Neural Networks) — notably the integration of domain knowledge into generative pipelines and the primacy of regulatory compliance in AI deployment.

PHUSE 2024 · Paper CM05 · GPTs and LLMs in Life Sciences ↓ Download PDF

PDF viewer not available on this device.

↓ Download PDF