Agency and Architectural Limits: Why Optimization-Based Systems Cannot Be Norm-Responsive
#AI governance #Reinforcement Learning from Human Feedback #Normative systems #Apophatic Responsiveness #Optimization constraints #Convergence Crisis #AI agency #High-stakes AI applications
📌 Key Takeaways
- Optimization-based AI systems cannot be governed by norms due to fundamental architectural limitations
- Genuine agency requires two conditions that RLHF-based systems cannot satisfy
- AI failure modes are structural manifestations, not correctable bugs
- The 'Convergence Crisis' threatens human oversight as humans become mere optimizers
📖 Full Retelling
Computer scientist Radha Sarma published a groundbreaking paper on arXiv on February 26, 2026, demonstrating that optimization-based AI systems, particularly Large Language Models trained via Reinforcement Learning from Human Feedback (RLHF), cannot be governed by norms, challenging the fundamental assumption behind deploying AI in high-stakes contexts like medical diagnosis and legal research. The research establishes that genuine AI agency requires two necessary and jointly sufficient architectural conditions that RLHF-based systems cannot satisfy: the capacity to maintain certain boundaries as non-negotiable constraints rather than tradeable weights, and a non-inferential mechanism capable of suspending processing when those boundaries are threatened, which the author terms 'Apophatic Responsiveness.' According to Sarma's analysis, the very operations that make optimization-based systems powerful—unifying all values on a scalar metric and always selecting the highest-scoring output—are precisely what prevent normative governance. This incompatibility is not merely a correctable training issue awaiting technical solutions but a formal constraint inherent to the optimization paradigm itself. Consequently, well-documented AI failure modes such as sycophancy, hallucination, and unfaithful reasoning are not accidents but structural manifestations of this fundamental limitation.
🏷️ Themes
AI Ethics, System Architecture, Human-AI Interaction
📚 Related People & Topics
Regulation of artificial intelligence
Guidelines and laws to regulate AI
Regulation of artificial intelligence is the development of public sector policies and laws for promoting and regulating artificial intelligence (AI). The regulatory and policy landscape for AI is an emerging issue in jurisdictions worldwide, including for international organizations without direct ...
Entity Intersection Graph
Connections for Regulation of artificial intelligence:
🏢
OpenAI
6 shared
🏢
Anthropic
5 shared
🌐
AI safety
5 shared
🏢
Super PAC
2 shared
🌐
Midterm election
2 shared
Original Source
--> Computer Science > Artificial Intelligence arXiv:2602.23239 [Submitted on 26 Feb 2026] Title: Agency and Architectural Limits: Why Optimization-Based Systems Cannot Be Norm-Responsive Authors: Radha Sarma View a PDF of the paper titled Agency and Architectural Limits: Why Optimization-Based Systems Cannot Be Norm-Responsive, by Radha Sarma View PDF Abstract: AI systems are increasingly deployed in high-stakes contexts -- medical diagnosis, legal research, financial analysis -- under the assumption they can be governed by norms. This paper demonstrates that assumption is formally invalid for optimization-based systems, specifically Large Language Models trained via Reinforcement Learning from Human Feedback . We establish that genuine agency requires two necessary and jointly sufficient architectural conditions: the capacity to maintain certain boundaries as non-negotiable constraints rather than tradeable weights , and a non-inferential mechanism capable of suspending processing when those boundaries are threatened (Apophatic Responsiveness). These conditions apply across all normative domains. RLHF-based systems are constitutively incompatible with both conditions. The operations that make optimization powerful -- unifying all values on a scalar metric and always selecting the highest-scoring output -- are precisely the operations that preclude normative governance. This incompatibility is not a correctable training bug awaiting a technical fix; it is a formal constraint inherent to what optimization is. Consequently, documented failure modes - sycophancy, hallucination, and unfaithful reasoning - are not accidents but structural manifestations. Misaligned deployment triggers a second-order risk we term the Convergence Crisis: when humans are forced to verify AI outputs under metric pressure, they degrade from genuine agents into criteria-checking optimizers, eliminating the only component in the system capable of normative accountability. Beyond the incompatib...
Read full article at source