#Attention Mechanisms
Latest news articles tagged with "Attention Mechanisms". Follow the timeline of events, related topics, and entities.
Articles (7)
-
๐บ๐ธ S2O: Early Stopping for Sparse Attention via Online Permutation
[USA]
arXiv:2602.22575v1 Announce Type: cross Abstract: Attention scales quadratically with sequence length, fundamentally limiting long-context inference. Existing block-granularity sparsification can red...
Related: #Machine Learning Optimization, #Computational Efficiency -
๐บ๐ธ Attending to Routers Aids Indoor Wireless Localization
[USA]
arXiv:2602.16762v1 Announce Type: cross Abstract: Modern machine learning-based wireless localization using Wi-Fi signals continues to face significant challenges in achieving groundbreaking performa...
Related: #Machine Learning, #Wireless Communication, #Indoor Localization, #Triangulation -
๐บ๐ธ Explainable AI: Context-Aware Layer-Wise Integrated Gradients for Explaining Transformer Models
[USA]
arXiv:2602.16608v1 Announce Type: cross Abstract: Transformer models achieve state-of-the-art performance across domains and tasks, yet their deeply layered representations make their predictions dif...
Related: #Explainable AI, #Natural Language Processing, #Deep Learning Interpretability, #Transformer Architecture -
๐บ๐ธ Expressive Power of Graph Transformers via Logic
[USA]
arXiv:2508.01067v2 Announce Type: replace-cross Abstract: Transformers are the basis of modern large language models, but relatively little is known about their precise expressive power on graphs. We...
Related: #Graph Neural Networks, #Transformer Architecture, #Expressive Power Analysis, #Theoretical vs Practical Evaluation -
๐บ๐ธ Sparrow: Text-Anchored Window Attention with Visual-Semantic Glimpsing for Speculative Decoding in Video LLMs
[USA]
arXiv:2602.15318v1 Announce Type: cross Abstract: Although speculative decoding is widely used to accelerate Vision-Language Models (VLMs) inference, it faces severe performance collapse when applied...
Related: #VisionโLanguage Models, #Video Large Language Models, #Speculative Decoding, #Cache Management -
๐บ๐ธ HyperMLP: An Integrated Perspective for Sequence Modeling
[USA]
arXiv:2602.12601v1 Announce Type: cross Abstract: Self-attention is often viewed as probabilistic query-key lookup, motivating designs that preserve normalized attention scores and fixed positional s...
Related: #Machine Learning, #Sequence Modeling -
๐บ๐ธ SLA2: Sparse-Linear Attention with Learnable Routing and QAT
[USA]
arXiv:2602.12675v1 Announce Type: cross Abstract: Sparse-Linear Attention (SLA) combines sparse and linear attention to accelerate diffusion models and has shown strong performance in video generatio...
Related: #Machine Learning, #Computational Efficiency