#Model Interpretability
Latest news articles tagged with "Model Interpretability". Follow the timeline of events, related topics, and entities.
Articles (7)
-
๐บ๐ธ MaS-VQA: A Mask-and-Select Framework for Knowledge-Based Visual Question Answering
[USA]
arXiv:2602.15915v1 Announce Type: cross Abstract: Knowledge-based Visual Question Answering (KB-VQA) requires models to answer questions by integrating visual information with external knowledge. How...
Related: #Visual Question Answering, #Knowledge Integration, #Noise Reduction, #Crossโmodal Reasoning -
๐บ๐ธ Batch-CAM: Introduction to better reasoning in convolutional deep learning models
[USA]
arXiv:2510.00664v2 Announce Type: replace Abstract: Deep learning opacity often impedes deployment in high-stakes domains. We propose a training framework that aligns model focus with class-represent...
Related: #AI Transparency, #Deep Learning -
๐บ๐ธ CodeCircuit: Toward Inferring LLM-Generated Code Correctness via Attribution Graphs
[USA]
arXiv:2602.07080v1 Announce Type: cross Abstract: Current paradigms for code verification rely heavily on external mechanisms-such as execution-based unit tests or auxiliary LLM judges-which are ofte...
Related: #Artificial Intelligence, #Software Engineering -
๐บ๐ธ Endogenous Resistance to Activation Steering in Language Models
[USA]
arXiv:2602.06941v1 Announce Type: cross Abstract: Large language models can resist task-misaligned activation steering during inference, sometimes recovering mid-generation to produce improved respon...
Related: #Artificial Intelligence, #AI Safety -
๐บ๐ธ On the Identifiability of Steering Vectors in Large Language Models
[USA]
arXiv:2602.06801v1 Announce Type: cross Abstract: Activation steering methods, such as persona vectors, are widely used to control large language model behavior and increasingly interpreted as reveal...
Related: #Artificial Intelligence, #Technology -
๐บ๐ธ DLM-Scope: Mechanistic Interpretability of Diffusion Language Models via Sparse Autoencoders
[USA]
arXiv:2602.05859v1 Announce Type: cross Abstract: Sparse autoencoders (SAEs) have become a standard tool for mechanistic interpretability in autoregressive large language models (LLMs), enabling rese...
Related: #Artificial Intelligence, #Machine Learning -
๐บ๐ธ Evaluating chain-of-thought monitorability
[USA]
OpenAI introduces a new framework and evaluation suite for chain-of-thought monitorability, covering 13 evaluations across 24 environments. Our findings show that monitoring a modelโs internal reasoni...
Related: #AI Safety, #Technical Innovation