GLoW Reimplementation — Victor Monnot

Context

The Global Latent Workspace (GLoW) is an architecture developed at the CerCo lab (CNRS, Toulouse) under Rufin VanRullen’s ERC Advanced Grant. It’s inspired by Global Workspace Theory — a neuroscience framework proposing that consciousness arises from a shared “workspace” where specialized brain modules broadcast information to each other.

The original implementation is built on the Shimmer framework. This project is my clean-room reimplementation from the paper, designed to deeply understand every component and explore extensions.

Architecture

The system consists of:

Domain-specific encoders — each modality (vision, language, proprioception) has its own encoder that maps raw data into a latent representation
A global workspace — a shared latent space where domain representations are projected, fused, and broadcast back
Contrastive alignment — domains are aligned in the workspace using contrastive losses to ensure cross-modal coherence

Current status

Working on the base architecture with two modalities (vision + language). Next steps include adding proprioceptive encoders and testing cross-modal generation.

Why this matters

If we want machines that understand the world the way brains do — through integrated, multimodal internal models — we need architectures that go beyond single-task learning. GLoW is one of the most promising frameworks for this, and reimplementing it is the fastest way to understand its strengths and limitations.