Notes on Shimmer and GLoW
Technical notes from reading through the Shimmer codebase and the GLoW architecture — what works, what's unclear, and what I want to extend.
What is Shimmer?
Shimmer is the open-source framework developed by the GLoW team at CerCo (CNRS, Toulouse) for building Global Latent Workspace models. It provides the base abstractions for domain modules, workspace fusion, and contrastive alignment.
Repository: github.com/ruflab/shimmer
Architecture overview
The core idea is simple but powerful:
- Domain modules: each modality (vision, language, proprioception, etc.) has an encoder that maps raw data to a latent vector and a decoder that reconstructs from it
- Global Workspace: a shared latent space where domain representations are projected using learned linear mappings
- Broadcast cycle: information enters the workspace from one domain, gets fused with other domain representations, and is broadcast back to all domains
The training objective combines reconstruction loss (each domain should be able to reconstruct its input from the workspace) and contrastive alignment (paired cross-modal examples should be close in workspace space).
Things I noticed
Clean separation of concerns. The domain module abstraction is well-designed. Adding a new modality is straightforward — you implement an encoder, a decoder, and a projection to/from workspace dimensionality.
The fusion mechanism is simple. Currently, workspace fusion is a learned weighted average of projected domain representations. This feels like a bottleneck — there might be more expressive fusion strategies worth exploring (attention-based, for instance).
Contrastive alignment is critical. Without it, the workspace degenerates. Each domain just learns to encode/decode through the workspace without actually sharing representations.
Questions I want to explore
- What happens with more than 3 domains? Does alignment get harder?
- Can you use the workspace as a zero-shot transfer mechanism between domains that were never paired during training?
- What if the workspace dimensionality is significantly larger than individual domain latents?
Next steps
Finishing my clean reimplementation to have a solid base for experiments. Then running ablations on workspace dimensionality and fusion strategies.