The Condensate Theorem: Transformers are O(n), Not $O(n^2)$
#Transformer #Attention Mechanism #Linear Complexity #Condensate Theorem #Machine Learning #arXiv #Topological Manifold
📌 Key Takeaways
- The Condensate Theorem proves Transformers operate at linear O(n) complexity in practice.
- Attention sparsity is identified as a learned topological property rather than an architectural constraint.
- The 'Condensate Manifold' allows models to focus on specific anchors and windows without checking every position.
- This discovery could significantly reduce the computational cost and energy required for large language models.
📖 Full Retelling
🏷️ Themes
Artificial Intelligence, Mathematics, Computing Efficiency
📚 Related People & Topics
Machine learning
Study of algorithms that improve automatically through experience
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of statistical algorithms that can learn from data and generalize to unseen data, and thus perform tasks without explicit instructions. Within a subdiscipline in machine learning, advances i...
Transformer
Device to couple energy between circuits
In electrical engineering, a transformer is a passive component that transfers electrical energy from one electrical circuit to another circuit, or multiple circuits. A varying current in any coil of the transformer produces a varying magnetic flux in the transformer's core, which induces a varying ...
🔗 Entity Intersection Graph
Connections for Machine learning:
- 🌐 Large language model (7 shared articles)
- 🌐 Generative artificial intelligence (3 shared articles)
- 🌐 Electroencephalography (3 shared articles)
- 🌐 Computer vision (3 shared articles)
- 🌐 Natural language processing (2 shared articles)
- 🌐 Artificial intelligence (2 shared articles)
- 🌐 Graph neural network (2 shared articles)
- 🌐 Neural network (2 shared articles)
- 🌐 User interface (1 shared articles)
- 👤 Stuart Russell (1 shared articles)
- 🌐 Ethics of artificial intelligence (1 shared articles)
- 👤 Susan Schneider (1 shared articles)
📄 Original Source Content
arXiv:2602.06317v1 Announce Type: cross Abstract: We present the Condensate Theorem: attention sparsity is a learned topological property, not an architectural constraint. Through empirical analysis of trained language models, we find that attention mass concentrates on a distinct topological manifold -- and this manifold can be identified dynamically without checking every position. We prove a general result: for any query, projecting attention onto the Condensate Manifold (Anchor + Window + D