#Stochastic gradient descent
Latest news articles tagged with "Stochastic gradient descent". Follow the timeline of events, related topics, and entities.
Articles (2)
-
πΊπΈ Conjugate Learning Theory: Uncovering the Mechanisms of Trainability and Generalization in Deep Neural Networks
[USA]
arXiv:2602.16177v1 Announce Type: cross Abstract: In this work, we propose a notion of practical learnability grounded in finite sample settings, and develop a conjugate learning theoretical framewor...
Related: #Learning theory, #Convex optimization, #Deep neural networks, #Eigenvalue analysis -
πΊπΈ Understanding Transformer Optimization via Gradient Heterogeneity
[USA]
arXiv:2502.00213v4 Announce Type: replace-cross Abstract: Transformers are difficult to optimize with stochastic gradient descent (SGD) and largely rely on adaptive optimizers such as Adam. Despite t...
Related: #Transformers, #Optimization, #Gradient heterogeneity, #Adaptive optimizers