Conference paper · PHUSE 2024 · Paper CM05 · Life Sciences & AI
Jérôme Vétillard (Microsoft) & Mark Lambrecht (SAS) · 16 pages · English
This paper, presented at the PHUSE 2024 conference, provides a comprehensive review of generative AI technologies and their applications across the life sciences value chain. The authors examine three foundational architectures — Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs) and Transformers — and map their capabilities to concrete use cases in drug discovery, genomics & proteomics, personalised medicine, clinical data automation, and decentralised clinical trials.
The paper also addresses the critical challenges of data quality, ethical considerations, model explainability, and regulatory compliance. It explores emerging paradigms including Retrieval Augmented Generation (RAG), quantum computing for molecular simulation, and edge AI for point-of-care decision support.
1 · Fundamentals of generative AI
GANs, VAEs, Transformers — architecture, training principles, and comparative strengths. From adversarial training to self-attention mechanisms.
2 · Application domains
Drug discovery (SMILES, molecular design, virtual screening), genomics & proteomics (sequence generation, AlphaFold, RoseTTAFold), in silico screening (molecular docking, MD simulations, QM/MM).
3 · Personalised medicine & predictive analytics
Precision profiling through multi-omics integration, predictive care trajectories, pharmacogenomics, and human-AI collaboration for clinical validation.
4 · Clinical data automation
Synthetic data generation, data augmentation, predictive modelling, and the SAS copilot-like functionality for automated statistical code generation on SAS Viya.
5 · Decentralised clinical trials
Remote participation through ECOA, real-time monitoring, inclusion & diversity, and the role of generative AI in adaptive trial design.
6 · Ethics, explainability & future prospects
Trustworthy AI principles, LLM explainability techniques, RAG architecture, quantum computing perspectives, and edge AI for bedside decision-making.
Written while at Microsoft Health & Life Sciences, this paper reflects the author's dual expertise in biotechnology (PhD ENS Ulm / AgroParisTech) and enterprise technology deployment across regulated industries. The collaboration with Mark Lambrecht (SAS) bridges the gap between AI infrastructure and biostatistical practice — a tension central to the responsible industrialisation of AI in clinical research.
The paper anticipates several themes later developed in the author's work on TweenMe (digital twin generation), PREDICARE (territorial predictive medicine), and the CINN architecture (Clinically-Informed Neural Networks) — notably the integration of domain knowledge into generative pipelines and the primacy of regulatory compliance in AI deployment.