#Attention Mechanisms
Latest news articles tagged with "Attention Mechanisms". Follow the timeline of events, related topics, and entities.
Articles (14)
-
πΊπΈ k-Maximum Inner Product Attention for Graph Transformers and the Expressive Power of GraphGPS The Expressive Power of GraphGPS
[USA]
arXiv:2604.03815v1 Announce Type: cross Abstract: Graph transformers have shown promise in overcoming limitations of traditional graph neural networks, such as oversquashing and difficulties in model...
Related: #Graph Neural Networks -
πΊπΈ CARE: Covariance-Aware and Rank-Enhanced Decomposition for Enabling Multi-Head Latent Attention
[USA]
arXiv:2603.17946v1 Announce Type: cross Abstract: Converting pretrained attention modules such as grouped-query attention (GQA) into multi-head latent attention (MLA) can improve expressivity without...
Related: #Machine Learning -
πΊπΈ Beyond Linearity in Attention Projections: The Case for Nonlinear Queries
[USA]
arXiv:2603.13381v1 Announce Type: cross Abstract: Recent algebraic analysis shows that in decoder-only and encoder-only transformers, the Query projection $W_Q$ may be set to identity without noticea...
Related: #AI Research -
πΊπΈ Summarize Before You Speak with ARACH: A Training-Free Inference-Time Plug-In for Enhancing LLMs via Global Attention Reallocation
[USA]
arXiv:2603.11067v1 Announce Type: cross Abstract: Large language models (LLMs) achieve remarkable performance, yet further gains often require costly training. This has motivated growing interest in ...
Related: #AI Enhancement -
πΊπΈ Stem: Rethinking Causal Information Flow in Sparse Attention
[USA]
arXiv:2603.06274v1 Announce Type: cross Abstract: The quadratic computational complexity of self-attention remains a fundamental bottleneck for scaling Large Language Models (LLMs) to long contexts, ...
Related: #Machine Learning -
πΊπΈ Attention's Gravitational Field:A Power-Law Interpretation of Positional Correlation
[USA]
arXiv:2603.04805v1 Announce Type: cross Abstract: This paper explores the underlying principles of positional relationships and encodings within Large Language Models (LLMs) and introduces the concep...
Related: #Positional Correlation -
πΊπΈ VSPrefill: Vertical-Slash Sparse Attention with Lightweight Indexing for Long-Context Prefilling
[USA]
arXiv:2603.04460v1 Announce Type: cross Abstract: The quadratic complexity of self-attention during the prefill phase impedes long-context inference in large language models. Existing sparse attentio...
Related: #AI Efficiency -
πΊπΈ S2O: Early Stopping for Sparse Attention via Online Permutation
[USA]
arXiv:2602.22575v1 Announce Type: cross Abstract: Attention scales quadratically with sequence length, fundamentally limiting long-context inference. Existing block-granularity sparsification can red...
Related: #Machine Learning Optimization, #Computational Efficiency -
πΊπΈ Attending to Routers Aids Indoor Wireless Localization
[USA]
arXiv:2602.16762v1 Announce Type: cross Abstract: Modern machine learning-based wireless localization using Wi-Fi signals continues to face significant challenges in achieving groundbreaking performa...
Related: #Machine Learning, #Wireless Communication, #Indoor Localization, #Triangulation -
πΊπΈ Expressive Power of Graph Transformers via Logic
[USA]
arXiv:2508.01067v2 Announce Type: replace-cross Abstract: Transformers are the basis of modern large language models, but relatively little is known about their precise expressive power on graphs. We...
Related: #Graph Neural Networks, #Transformer Architecture, #Expressive Power Analysis, #Theoretical vs Practical Evaluation -
πΊπΈ Explainable AI: Context-Aware Layer-Wise Integrated Gradients for Explaining Transformer Models
[USA]
arXiv:2602.16608v1 Announce Type: cross Abstract: Transformer models achieve state-of-the-art performance across domains and tasks, yet their deeply layered representations make their predictions dif...
Related: #Explainable AI, #Natural Language Processing, #Deep Learning Interpretability, #Transformer Architecture -
πΊπΈ Sparrow: Text-Anchored Window Attention with Visual-Semantic Glimpsing for Speculative Decoding in Video LLMs
[USA]
arXiv:2602.15318v1 Announce Type: cross Abstract: Although speculative decoding is widely used to accelerate Vision-Language Models (VLMs) inference, it faces severe performance collapse when applied...
Related: #VisionβLanguage Models, #Video Large Language Models, #Speculative Decoding, #Cache Management -
πΊπΈ SLA2: Sparse-Linear Attention with Learnable Routing and QAT
[USA]
arXiv:2602.12675v1 Announce Type: cross Abstract: Sparse-Linear Attention (SLA) combines sparse and linear attention to accelerate diffusion models and has shown strong performance in video generatio...
Related: #Machine Learning, #Computational Efficiency -
πΊπΈ HyperMLP: An Integrated Perspective for Sequence Modeling
[USA]
arXiv:2602.12601v1 Announce Type: cross Abstract: Self-attention is often viewed as probabilistic query-key lookup, motivating designs that preserve normalized attention scores and fixed positional s...
Related: #Machine Learning, #Sequence Modeling
Key Entities (9)
- Transformer (deep learning) (2 news)
- Stem (1 news)
- Early stopping (1 news)
- The Case (1 news)
- Artificial intelligence (1 news)
- Large language model (1 news)
- MLP (1 news)
- Gravitational field (1 news)
- CARE International (1 news)
About the topic: Attention Mechanisms
The topic "Attention Mechanisms" aggregates 14+ news articles from various countries.