#Adaptive optimizers
Latest news articles tagged with "Adaptive optimizers". Follow the timeline of events, related topics, and entities.
Articles (2)
-
πΊπΈ Understanding Transformer Optimization via Gradient Heterogeneity
[USA]
arXiv:2502.00213v4 Announce Type: replace-cross Abstract: Transformers are difficult to optimize with stochastic gradient descent (SGD) and largely rely on adaptive optimizers such as Adam. Despite t...
Related: #Transformers, #Optimization, #Gradient heterogeneity, #Stochastic gradient descent -
πΊπΈ On Surprising Effectiveness of Masking Updates in Adaptive Optimizers
[USA]
arXiv:2602.15322v1 Announce Type: cross Abstract: Training large language models (LLMs) relies almost exclusively on dense adaptive optimizers with increasingly sophisticated preconditioners. We chal...
Related: #Machine learning optimization, #Large language model training, #Regularization techniques, #Curvature-aware algorithms