SP
BravenNow
Certified Circuits: Stability Guarantees for Mechanistic Circuits
| USA | technology | βœ“ Verified - arxiv.org

Certified Circuits: Stability Guarantees for Mechanistic Circuits

#Certified Circuits #Mechanistic interpretability #Neural networks #Stability guarantees #Circuit discovery #Out-of-distribution #Artificial intelligence

πŸ“Œ Key Takeaways

  • Researchers developed Certified Circuits framework providing provable stability guarantees for neural network circuit discovery
  • Existing circuit discovery methods are brittle and dependent on specific concept datasets
  • Certified Circuits uses randomized data subsampling to ensure consistent component inclusion despite dataset perturbations
  • The approach achieves higher accuracy with fewer neurons compared to existing methods

πŸ“– Full Retelling

Researchers Alaa Anani, Tobias Lorenz, Bernt Schiele, Mario Fritz, and Jonas Fischer introduced 'Certified Circuits,' a framework providing provable stability guarantees for circuit discovery in neural networks, in their paper submitted to arXiv on February 26, 2026, addressing the brittleness of existing methods that depend strongly on chosen concept datasets and often fail to transfer out-of-distribution. The paper addresses a fundamental challenge in artificial intelligence: understanding how neural networks arrive at their predictions, which is crucial for debugging, auditing, and deploying AI systems safely and effectively. Mechanistic interpretability aims to achieve this by identifying 'circuits' - minimal subnetworks within neural networks that are responsible for specific behaviors or predictions. However, the researchers point out that current circuit discovery methods have significant limitations, as these methods are brittle and the identified circuits often fail to perform well when applied to out-of-distribution data, raising questions about whether they truly capture underlying concepts or merely dataset-specific artifacts. To solve this problem, the researchers developed Certified Circuits, which wraps any black-box discovery algorithm with randomized data subsampling to ensure that decisions about including specific components in the circuit remain consistent even when the concept dataset is subjected to bounded edit-distance perturbations. The framework avoids including unstable neurons, resulting in circuits that are both more compact and more accurate, achieving up to 91% higher accuracy while using 45% fewer neurons on ImageNet and out-of-distribution datasets, maintaining reliability where baseline methods degrade.

🏷️ Themes

AI interpretability, Neural network reliability, Scientific methodology

πŸ“š Related People & Topics

Neural network

Structure in biology and artificial intelligence

A neural network is a group of interconnected units called neurons that send signals to one another. Neurons can be either biological cells or mathematical models. While individual neurons are simple, many of them together in a network can perform complex tasks.

View Profile β†’ Wikipedia β†—

Mechanistic interpretability

Reverse-engineering neural networks

Mechanistic interpretability (often abbreviated as mech interp, mechinterp, or MI) is a subfield of research within explainable artificial intelligence that aims to understand the internal workings of neural networks by analyzing the mechanisms present in their computations. The approach seeks to an...

View Profile β†’ Wikipedia β†—

Entity Intersection Graph

Connections for Neural network:

🌐 Deep learning 3 shared
🌐 Large language model 2 shared
🌐 Interpretability 2 shared
🌐 Explainable artificial intelligence 2 shared
🌐 Machine learning 1 shared
View full profile

Mentioned Entities

Neural network

Structure in biology and artificial intelligence

Mechanistic interpretability

Reverse-engineering neural networks

}
Original Source
--> Computer Science > Artificial Intelligence arXiv:2602.22968 [Submitted on 26 Feb 2026] Title: Certified Circuits: Stability Guarantees for Mechanistic Circuits Authors: Alaa Anani , Tobias Lorenz , Bernt Schiele , Mario Fritz , Jonas Fischer View a PDF of the paper titled Certified Circuits: Stability Guarantees for Mechanistic Circuits, by Alaa Anani and 4 other authors View PDF HTML Abstract: Understanding how neural networks arrive at their predictions is essential for debugging, auditing, and deployment. Mechanistic interpretability pursues this goal by identifying circuits - minimal subnetworks responsible for specific behaviors. However, existing circuit discovery methods are brittle: circuits depend strongly on the chosen concept dataset and often fail to transfer out-of-distribution, raising doubts whether they capture concept or dataset-specific artifacts. We introduce Certified Circuits, which provide provable stability guarantees for circuit discovery. Our framework wraps any black-box discovery algorithm with randomized data subsampling to certify that circuit component inclusion decisions are invariant to bounded edit-distance perturbations of the concept dataset. Unstable neurons are abstained from, yielding circuits that are more compact and more accurate. On ImageNet and OOD datasets, certified circuits achieve up to 91% higher accuracy while using 45% fewer neurons, and remain reliable where baselines degrade. Certified Circuits puts circuit discovery on formal ground by producing mechanistic explanations that are provably stable and better aligned with the target concept. Code will be released soon! Subjects: Artificial Intelligence (cs.AI) ; Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY) Cite as: arXiv:2602.22968 [cs.AI] (or arXiv:2602.22968v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2602.22968 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Alaa ...
Read full article at source

Source

arxiv.org

More from USA

News from Other Countries

πŸ‡¬πŸ‡§ United Kingdom

πŸ‡ΊπŸ‡¦ Ukraine