Impact of Artificial Intelligence on bio-industries

The Qualees R&D Team (Salma Barkaoui – Head of Data Sciences, Ivan Ignatiev – CTO, Jérôme Vetillard – VP R&D) was invited by the École de Biologie Industrielle (EBI) to participate in a round table on the impacts of AI on bio-industries at the Research Day of 24 January 2025.

Context and speakers

Two themes addressed: AI as a research accelerator, and AI and optimization of industrial/logistics processes. Speakers: Sophie Hamelin (L’Oréal, digital transformation and the “augmented researcher”), Stéphane Menio (Safran Landing Systems, R&D Director), Lionel Pelletier (Aktehom, data integrity and regulatory intelligence), Fabrice Ruiz (Clinsearch, EBI board member, moderator). The second theme could not be addressed due to the richness of exchanges on the first.

The crucial importance of data

The adage “Garbage in, Garbage out” remains more valid than ever in the AI era. In highly regulated sectors such as healthcare, algorithm certification and compliance (GDPR, AI Act) are major challenges. The model training process is often opaque — trade secrets, shadows over data provenance, strict European regulation vs. American deregulation. Quality assessment depends heavily on the application domain. Just as an industrialist controls the quality of raw materials, it is imperative to evaluate and correct data quality before any training — essential for industrializing AI production, particularly in digital twin design (TweenMe by Qualees).

Medical domain specifics: HDLSS data

Medical data is often High Dimensionality, Low Sample Size: many variables (genetics, imaging, biomarkers, clinical records) but few rows (a few thousand patients at most). Unlike LLMs that rely on massive, largely unidimensional text volumes, medical data is multimodal and requires clinical expertise for preprocessing and standardization. Specialized tokenization, abbreviation normalization, PHI anonymization, multivariate imputation (kNN, MICE), dimensionality reduction (PCA, t-SNE): all necessary transformations before any training.

Ecological impact and resource concentration

LLM training carries considerable energy costs: GPT-3 ~1,300 MWh, GPT-4 ~3,000 MWh, BLOOM ~433 MWh. Required compute power is multiplied by 4-5x every year (Epoch AI). GPU concentration among big tech (1.8 million at Microsoft vs. 300 at Stanford) raises concerns about democratizing AI access. Qualees approach: compact, specialized AIs, Kubernetes cluster consuming ~500 kWh/year in continuous operation.

Cybersecurity and AI in healthcare

Beyond classic measures (strong authentication, zero trust architecture, xDR, SIEM, SOC), AI introduces new risks: malicious prompt injection forcing LLMs to disclose sensitive information, differential privacy violation enabling extraction of training data, data poisoning distorting models with severe consequences in diagnostics, and AI-powered attack strategies (real-time deep fakes, automated phishing).

AI in R&D: limits and realities

AlphaFold v3: tertiary and quaternary protein conformation prediction and ligand/receptor binding force calculation, but ignoring real physico-chemical conditions (pH, aqueous phase, temperature), crucial for purification or formulation. Sequence generation: a simple Excel macro can generate random sequences — the added value lies in functional prediction and experimental feasibility. EBI student feedback: wet lab vs. in silico comparison, students chose a different tool over AlphaFold, deemed too distant from experimental results.

Conclusion

AI must be seen as an accelerator and support tool, not a substitute for human expertise. Its deployment demands methodological rigor, attention to quality and respect for industrial, medical and regulatory constraints. Recommendations: strengthen data traceability and auditing, establish common interoperability standards, train teams in data science fundamentals and cybersecurity, systematically confront theory with practice through in vivo/in vitro validation.

Read the document

↓ Download PDF

Key takeaways

'Garbage in, garbage out' remains more valid than ever — training data quality and auditing are often overlooked.
Medical HDLSS data: many variables (genetics, imaging, biomarkers) but few rows — the curse of dimensionality complicates model development.
Energy impact: GPT-3 ~1,300 MWh, GPT-4 ~3,000 MWh. Required compute power multiplied by 4-5x every year. GPU concentration among big tech.
AlphaFold v3: protein conformation prediction, but ignores real physico-chemical conditions (pH, aqueous phase, temperature).
AI cybersecurity: prompt injection, differential privacy violation, data poisoning, real-time deep fakes — mounting threats in healthcare.
AI must be seen as an accelerator and support tool, not a substitute for human expertise. Qualees approach: compact AIs (~500 kWh/year).