Notes on Shimmer and GLoW

What is Shimmer?

Shimmer is the open-source framework developed by the GLoW team at CerCo (CNRS, Toulouse) for building Global Latent Workspace models. It provides the base abstractions for domain modules, workspace fusion, and contrastive alignment.

Repository: github.com/ruflab/shimmer

Architecture overview

The core idea is simple but powerful:

Domain modules: each modality (vision, language, proprioception, etc.) has an encoder that maps raw data to a latent vector and a decoder that reconstructs from it
Global Workspace: a shared latent space where domain representations are projected using learned linear mappings
Broadcast cycle: information enters the workspace from one domain, gets fused with other domain representations, and is broadcast back to all domains

The training objective combines reconstruction loss (each domain should be able to reconstruct its input from the workspace) and contrastive alignment (paired cross-modal examples should be close in workspace space).

Things I noticed

Clean separation of concerns. The domain module abstraction is well-designed. Adding a new modality is straightforward — you implement an encoder, a decoder, and a projection to/from workspace dimensionality.

The fusion mechanism is simple. Currently, workspace fusion is a learned weighted average of projected domain representations. This feels like a bottleneck — there might be more expressive fusion strategies worth exploring (attention-based, for instance).

Contrastive alignment is critical. Without it, the workspace degenerates. Each domain just learns to encode/decode through the workspace without actually sharing representations.

Questions I want to explore

What happens with more than 3 domains? Does alignment get harder?
Can you use the workspace as a zero-shot transfer mechanism between domains that were never paired during training?
What if the workspace dimensionality is significantly larger than individual domain latents?

Next steps

Finishing my clean reimplementation to have a solid base for experiments. Then running ablations on workspace dimensionality and fusion strategies.