SP
BravenNow
From Shallow Bayesian Neural Networks to Gaussian Processes: General Convergence, Identifiability and Scalable Inference
| USA | technology | ✓ Verified - arxiv.org

From Shallow Bayesian Neural Networks to Gaussian Processes: General Convergence, Identifiability and Scalable Inference

#Bayesian Neural Networks #Gaussian Processes #Statistical Convergence #Machine Learning Theory #Scalable Inference #Identifiability #Covariance Functions #Nyström Approximation

📌 Key Takeaways

  • Researchers established general convergence from Bayesian neural networks to Gaussian processes with relaxed assumptions
  • Novel covariance function combines components from four widely used activation functions
  • Developed scalable training method using Nyström approximation for cost-accuracy control
  • Demonstrated stable hyperparameter estimates and competitive performance on real datasets
  • Advances theoretical foundation while providing practical implementation solutions

📖 Full Retelling

Researchers Gracielle Antunes de Araújo and Flávio B. Gonçalves published a groundbreaking paper on arXiv on February 26, 2026, exploring the theoretical connection between shallow Bayesian neural networks (BNNs) and Gaussian processes (GPs) with significant implications for statistical modeling, identifiability, and scalable inference in machine learning. In their comprehensive study, the researchers established a general convergence result from BNNs to GPs by relaxing assumptions used in previous formulations, offering new insights into how these seemingly different modeling approaches relate to each other. They compared alternative parameterizations of the limiting GP model and proposed a novel covariance function defined as a convex mixture of components induced by four widely used activation functions. This theoretical advancement addresses fundamental questions about the statistical properties of neural network models as they scale. The research team characterized key properties of their new approach, including positive definiteness and both strict and practical identifiability under different input designs. For computational implementation, they developed a scalable maximum a posterior training and prediction procedure using a Nyström approximation, demonstrating how the Nyström rank and anchor selection can control the cost-accuracy trade-off. Their experiments on both controlled simulations and real-world tabular datasets showed stable hyperparameter estimates and competitive predictive performance at realistic computational costs, bridging the gap between theoretical guarantees and practical applications.

🏷️ Themes

Machine Learning Theory, Statistical Modeling, Computational Efficiency, Neural Networks

📚 Related People & Topics

Gaussian process

Statistical model

In probability theory and statistics, a Gaussian process is a stochastic process (a collection of random variables indexed by time or space), such that every finite collection of those random variables has a multivariate normal distribution. The distribution of a Gaussian process is the joint distri...

View Profile → Wikipedia ↗

Entity Intersection Graph

Connections for Gaussian process:

🌐 Reinforcement learning 1 shared
🌐 Free lunch 1 shared
View full profile
Original Source
-- Machine Learning arXiv:2602.22492 [Submitted on 26 Feb 2026] Title: From Shallow Bayesian Neural Networks to Gaussian Processes: General Convergence, Identifiability and Scalable Inference Authors: Gracielle Antunes de Araújo , Flávio B. Gonçalves View a PDF of the paper titled From Shallow Bayesian Neural Networks to Gaussian Processes: General Convergence, Identifiability and Scalable Inference, by Gracielle Antunes de Ara\'ujo and Fl\'avio B. Gon\c alves View PDF HTML Abstract: In this work, we study scaling limits of shallow Bayesian neural networks via their connection to Gaussian processes , with an emphasis on statistical modeling, identifiability, and scalable inference. We first establish a general convergence result from BNNs to GPs by relaxing assumptions used in prior formulations, and we compare alternative parameterizations of the limiting GP model. Building on this theory, we propose a new covariance function defined as a convex mixture of components induced by four widely used activation functions, and we characterize key properties including positive definiteness and both strict and practical identifiability under different input designs. For computation, we develop a scalable maximum a posterior training and prediction procedure using a Nyström approximation, and we show how the Nyström rank and anchor selection control the cost-accuracy trade-off. Experiments on controlled simulations and real-world tabular datasets demonstrate stable hyperparameter estimates and competitive predictive performance at realistic computational cost. Comments: 29 pages, 4 figures, 8 tables. Supplementary material included Subjects: Machine Learning (stat.ML) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG) Cite as: arXiv:2602.22492 [stat.ML] (or arXiv:2602.22492v1 [stat.ML] for this version) https://doi.org/10.48550/arXiv.2602.22492 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Gracielle Antunes De Arau...
Read full article at source

Source

arxiv.org

More from USA

News from Other Countries

🇬🇧 United Kingdom

🇺🇦 Ukraine