#Attention Mechanisms

Latest news articles tagged with "Attention Mechanisms". Follow the timeline of events, related topics, and entities.

Articles (14)

🇺🇸 k-Maximum Inner Product Attention for Graph Transformers and the Expressive Power of GraphGPS The Expressive Power of GraphGPS — 07/04/2026 [USA]
arXiv:2604.03815v1 Announce Type: cross Abstract: Graph transformers have shown promise in overcoming limitations of traditional graph neural networks, such as oversquashing and difficulties in model...
Related: #Graph Neural Networks
🇺🇸 CARE: Covariance-Aware and Rank-Enhanced Decomposition for Enabling Multi-Head Latent Attention — 19/03/2026 [USA]
arXiv:2603.17946v1 Announce Type: cross Abstract: Converting pretrained attention modules such as grouped-query attention (GQA) into multi-head latent attention (MLA) can improve expressivity without...
Related: #Machine Learning
🇺🇸 Beyond Linearity in Attention Projections: The Case for Nonlinear Queries — 17/03/2026 [USA]
arXiv:2603.13381v1 Announce Type: cross Abstract: Recent algebraic analysis shows that in decoder-only and encoder-only transformers, the Query projection $W_Q$ may be set to identity without noticea...
Related: #AI Research
🇺🇸 Summarize Before You Speak with ARACH: A Training-Free Inference-Time Plug-In for Enhancing LLMs via Global Attention Reallocation — 13/03/2026 [USA]
arXiv:2603.11067v1 Announce Type: cross Abstract: Large language models (LLMs) achieve remarkable performance, yet further gains often require costly training. This has motivated growing interest in ...
Related: #AI Enhancement
🇺🇸 Stem: Rethinking Causal Information Flow in Sparse Attention — 09/03/2026 [USA]
arXiv:2603.06274v1 Announce Type: cross Abstract: The quadratic computational complexity of self-attention remains a fundamental bottleneck for scaling Large Language Models (LLMs) to long contexts, ...
Related: #Machine Learning
🇺🇸 Attention's Gravitational Field:A Power-Law Interpretation of Positional Correlation — 06/03/2026 [USA]
arXiv:2603.04805v1 Announce Type: cross Abstract: This paper explores the underlying principles of positional relationships and encodings within Large Language Models (LLMs) and introduces the concep...
Related: #Positional Correlation
🇺🇸 VSPrefill: Vertical-Slash Sparse Attention with Lightweight Indexing for Long-Context Prefilling — 06/03/2026 [USA]
arXiv:2603.04460v1 Announce Type: cross Abstract: The quadratic complexity of self-attention during the prefill phase impedes long-context inference in large language models. Existing sparse attentio...
Related: #AI Efficiency
🇺🇸 S2O: Early Stopping for Sparse Attention via Online Permutation — 27/02/2026 [USA]
arXiv:2602.22575v1 Announce Type: cross Abstract: Attention scales quadratically with sequence length, fundamentally limiting long-context inference. Existing block-granularity sparsification can red...
Related: #Machine Learning Optimization, #Computational Efficiency
🇺🇸 Attending to Routers Aids Indoor Wireless Localization — 20/02/2026 [USA]
arXiv:2602.16762v1 Announce Type: cross Abstract: Modern machine learning-based wireless localization using Wi-Fi signals continues to face significant challenges in achieving groundbreaking performa...
Related: #Machine Learning, #Wireless Communication, #Indoor Localization, #Triangulation
🇺🇸 Expressive Power of Graph Transformers via Logic — 19/02/2026 [USA]
arXiv:2508.01067v2 Announce Type: replace-cross Abstract: Transformers are the basis of modern large language models, but relatively little is known about their precise expressive power on graphs. We...
Related: #Graph Neural Networks, #Transformer Architecture, #Expressive Power Analysis, #Theoretical vs Practical Evaluation
🇺🇸 Explainable AI: Context-Aware Layer-Wise Integrated Gradients for Explaining Transformer Models — 19/02/2026 [USA]
arXiv:2602.16608v1 Announce Type: cross Abstract: Transformer models achieve state-of-the-art performance across domains and tasks, yet their deeply layered representations make their predictions dif...
Related: #Explainable AI, #Natural Language Processing, #Deep Learning Interpretability, #Transformer Architecture
🇺🇸 Sparrow: Text-Anchored Window Attention with Visual-Semantic Glimpsing for Speculative Decoding in Video LLMs — 18/02/2026 [USA]
arXiv:2602.15318v1 Announce Type: cross Abstract: Although speculative decoding is widely used to accelerate Vision-Language Models (VLMs) inference, it faces severe performance collapse when applied...
Related: #Vision‑Language Models, #Video Large Language Models, #Speculative Decoding, #Cache Management
🇺🇸 SLA2: Sparse-Linear Attention with Learnable Routing and QAT — 16/02/2026 [USA]
arXiv:2602.12675v1 Announce Type: cross Abstract: Sparse-Linear Attention (SLA) combines sparse and linear attention to accelerate diffusion models and has shown strong performance in video generatio...
Related: #Machine Learning, #Computational Efficiency
🇺🇸 HyperMLP: An Integrated Perspective for Sequence Modeling — 16/02/2026 [USA]
arXiv:2602.12601v1 Announce Type: cross Abstract: Self-attention is often viewed as probabilistic query-key lookup, motivating designs that preserve normalized attention scores and fixed positional s...
Related: #Machine Learning, #Sequence Modeling

Key Entities (9)

About the topic: Attention Mechanisms

The topic "Attention Mechanisms" aggregates 14+ news articles from various countries.