Round table at the EBI Research Day — 24 January 2025
The Qualees R&D Team (Salma Barkaoui – Head of Data Sciences, Ivan Ignatiev – CTO, Jérôme Vetillard – VP R&D) was invited by the École de Biologie Industrielle (EBI) to participate in a round table on the impacts of AI on bio-industries at the Research Day of 24 January 2025.
Two themes addressed: AI as a research accelerator, and AI and optimization of industrial/logistics processes. Speakers: Sophie Hamelin (L’Oréal, digital transformation and the “augmented researcher”), Stéphane Menio (Safran Landing Systems, R&D Director), Lionel Pelletier (Aktehom, data integrity and regulatory intelligence), Fabrice Ruiz (Clinsearch, EBI board member, moderator). The second theme could not be addressed due to the richness of exchanges on the first.
The adage “Garbage in, Garbage out” remains more valid than ever in the AI era. In highly regulated sectors such as healthcare, algorithm certification and compliance (GDPR, AI Act) are major challenges. The model training process is often opaque — trade secrets, shadows over data provenance, strict European regulation vs. American deregulation. Quality assessment depends heavily on the application domain. Just as an industrialist controls the quality of raw materials, it is imperative to evaluate and correct data quality before any training — essential for industrializing AI production, particularly in digital twin design (TweenMe by Qualees).
Medical data is often High Dimensionality, Low Sample Size: many variables (genetics, imaging, biomarkers, clinical records) but few rows (a few thousand patients at most). Unlike LLMs that rely on massive, largely unidimensional text volumes, medical data is multimodal and requires clinical expertise for preprocessing and standardization. Specialized tokenization, abbreviation normalization, PHI anonymization, multivariate imputation (kNN, MICE), dimensionality reduction (PCA, t-SNE): all necessary transformations before any training.
LLM training carries considerable energy costs: GPT-3 ~1,300 MWh, GPT-4 ~3,000 MWh, BLOOM ~433 MWh. Required compute power is multiplied by 4-5x every year (Epoch AI). GPU concentration among big tech (1.8 million at Microsoft vs. 300 at Stanford) raises concerns about democratizing AI access. Qualees approach: compact, specialized AIs, Kubernetes cluster consuming ~500 kWh/year in continuous operation.
Beyond classic measures (strong authentication, zero trust architecture, xDR, SIEM, SOC), AI introduces new risks: malicious prompt injection forcing LLMs to disclose sensitive information, differential privacy violation enabling extraction of training data, data poisoning distorting models with severe consequences in diagnostics, and AI-powered attack strategies (real-time deep fakes, automated phishing).
AlphaFold v3: tertiary and quaternary protein conformation prediction and ligand/receptor binding force calculation, but ignoring real physico-chemical conditions (pH, aqueous phase, temperature), crucial for purification or formulation. Sequence generation: a simple Excel macro can generate random sequences — the added value lies in functional prediction and experimental feasibility. EBI student feedback: wet lab vs. in silico comparison, students chose a different tool over AlphaFold, deemed too distant from experimental results.
AI must be seen as an accelerator and support tool, not a substitute for human expertise. Its deployment demands methodological rigor, attention to quality and respect for industrial, medical and regulatory constraints. Recommendations: strengthen data traceability and auditing, establish common interoperability standards, train teams in data science fundamentals and cybersecurity, systematically confront theory with practice through in vivo/in vitro validation.