Symbol grounding, representational blindness, and stratification of grounding levels
Part 2/3 of the article “Encoding, transduction, and world models”. Follows part 1/3 (chapters 1-2). Part 3/3 (chapters 6-7) is in preparation. Original article in French.
In biological systems, encoding is carried out by sensory transducers — specialised anatomical structures that convert forms of physical energy into electrochemical signals. These mechanisms are constrained by phylogenesis, biophysical properties, and ecological niches. Perception and action form continuous dynamic loops coupled with environmental affordances (Gibson). In corpus-based AI systems, the signal has already passed through a chain of human mediations before reaching the model: this constitutes second-degree encoding. The model learns the discursive structures through which humans describe the world, not the properties of the world itself. The epistemic distance to the world — the number and nature of informational mediations — is qualitatively different.
The symbol grounding problem (Harnad, 1990) formalises the idea that in a purely symbolic system, symbols only refer to other symbols. Molyneux’s problem (1688) and Mary’s argument (Jackson, 1982) radicalise this distinction between propositional knowledge and perceptual experience. An LLM can describe the physics of red without possessing any corresponding perceptual experience. The history of science further shows that theorisation does not free itself from empirical grounding: thought experiments reorganise regularities already observed, they do not replace them.
The article proposes a stratification into three conceptually distinct grounding levels: (1) referential/perceptual grounding in Harnad’s sense — anchoring symbols in perceptual categories learned from experience; (2) multimodal grounding in Barsalou’s sense (1999) — integration of multiple modalities through embodied co-experience; (3) episodic, affective, and autonoetic grounding in the sense of Tulving (1983) and Damasio (1994) — anchoring in a biographical memory structured by somatic markers and a subjective perspective. Adding sensors can solve the first level, but levels 2 and 3 require a history of situated multimodal interactions and biographical continuity that current architectures do not demonstrably reproduce.