#Transformer Architecture

Latest news articles tagged with "Transformer Architecture". Follow the timeline of events, related topics, and entities.

Articles (6)

🇺🇸 Residual Stream Duality in Modern Transformer Architectures — 18/03/2026 [USA]
arXiv:2603.16039v1 Announce Type: cross Abstract: Recent work has made clear that the residual pathway is not mere optimization plumbing; it is part of the model's representational machinery. We agre...
Related: #AI Research
🇺🇸 PolyGLU: State-Conditional Activation Routing in Transformer Feed-Forward Networks — 17/03/2026 [USA]
arXiv:2603.13347v1 Announce Type: cross Abstract: Biological neural systems employ diverse neurotransmitters -- glutamate, GABA, dopamine, acetylcholine -- to implement distinct signal-processing mod...
Related: #Neural Network Efficiency
🇺🇸 Disentangling Recall and Reasoning in Transformer Models through Layer-wise Attention and Activation Analysis — 16/03/2026 [USA]
arXiv:2510.03366v2 Announce Type: replace-cross Abstract: Transformer-based language models excel at both recall (retrieving memorized facts) and reasoning (performing multi-step inference), but whet...
Related: #AI Interpretability
🇺🇸 Lost in the Middle at Birth: An Exact Theory of Transformer Position Bias — 12/03/2026 [USA]
arXiv:2603.10123v1 Announce Type: cross Abstract: The ``Lost in the Middle'' phenomenon -- a U-shaped performance curve where LLMs retrieve well from the beginning and end of a context but fail in th...
Related: #Position Bias
🇺🇸 Explainable AI: Context-Aware Layer-Wise Integrated Gradients for Explaining Transformer Models — 19/02/2026 [USA]
arXiv:2602.16608v1 Announce Type: cross Abstract: Transformer models achieve state-of-the-art performance across domains and tasks, yet their deeply layered representations make their predictions dif...
Related: #Explainable AI, #Natural Language Processing, #Deep Learning Interpretability, #Integrated Gradients
🇺🇸 Expressive Power of Graph Transformers via Logic — 19/02/2026 [USA]
arXiv:2508.01067v2 Announce Type: replace-cross Abstract: Transformers are the basis of modern large language models, but relatively little is known about their precise expressive power on graphs. We...
Related: #Graph Neural Networks, #Expressive Power Analysis, #Attention Mechanisms, #Theoretical vs Practical Evaluation

Key Entities (5)

About the topic: Transformer Architecture

The topic "Transformer Architecture" aggregates 6+ news articles from various countries.