Encoding, transduction, and world models — Part 1/3

Part 1/3 of the article “Encoding, transduction, and world models”. Part 2/3 (chapters 3-5) is published. Part 3/3 (chapters 6-7) is in preparation. Original article in French.

1. The classical objection and a more rigorous reformulation

The recurring critique that AI systems are “trapped in language” rests on a confusion between language and representation. Dreyfus (Heideggerian phenomenology, practical know-how vs symbolic manipulation) and Searle (Chinese Room, syntax vs semantics) identify a genuine tension: the distance between syntactic processing and lived semantics. The limitation of LLMs does not stem from confinement within language but from the joint absence of direct sensorimotor grounding, multimodal grounding founded on lived co-constitution, and a biographically organised episodic memory.

2. Representation as a condition of all cognition

All intelligence — biological or artificial — implies a prior transformation of the world into manipulable internal states. The article distinguishes three forms of encoding: (1) the explicit encoder, a dedicated architectural module (Transformer, VAE, CLIP) that defines the informational geometry of the latent space; (2) implicit encoding, distributed across network layers (CNN, LSTM, self-attention), an emergent property of optimisation; (3) pipeline encoding, the set of decisions made by the data scientist upstream of the model — variable selection, transformation, incompleteness handling — which defines the model’s perception space. This third layer is epistemologically the most loaded: the model has no access to dimensions of the world that have not been encoded.

Read the document

↓ Download PDF

Key takeaways

The critique that AI systems are 'trapped in language' rests on a confusion between language and representation. The problem is how representations are formed, grounded, and organised.
Dreyfus (Heideggerian phenomenology, practical know-how vs symbolic manipulation) and Searle (Chinese Room, syntax vs semantics) identify a genuine tension: the distance between syntactic processing and lived semantics.
All intelligence implies a representational mediation prior to information processing — including subsymbolic, distributed, or dynamic forms. No cognitive system processes the raw world.
Three forms of encoding in AI: explicit encoder (Transformer, VAE, CLIP), implicit encoding (CNN/RNN layers as functional encoders), and pipeline encoding (data scientist decisions).
Pipeline encoding is epistemologically the most loaded: variable selection (scope), variable transformation (geometry), incompleteness handling (implicit theory of absence).
The model has no access to dimensions of the world that have not been encoded. This opacity is structural and differs from phylogenetic biological constraints.
Vectorial latent spaces geometrise information — but behind the apparent similarity with biological representations lie profoundly different formation mechanisms and grounding.