AI alignment
Conformance of AI to intended objectives
📊 Rating
8 news mentions · 👍 0 likes · 👎 0 dislikes
📌 Topics
- Machine Learning (5)
- AI Safety (4)
- Artificial Intelligence (4)
- Cybersecurity (3)
- Machine Ethics (1)
- Computer Science (1)
- Technology (1)
- Security (1)
- Mathematics (1)
- Digital Sovereignty (1)
- Linguistics (1)
- Social Choice Theory (1)
🏷️ Keywords
AI alignment (8) · arXiv (4) · arXiv research (2) · Large Language Models (2) · value-object (1) · Hume's is-ought gap (1) · specification trap (1) · capability scaling (1) · autonomous systems (1) · Vision-Language Models (1) · VLM safety (1) · multimodal jailbreak (1) · Risk Awareness Injection (1) · LLM security (1) · Regime leakage (1) · Situational awareness (1) · Sleeper agents (1) · Safety evaluation (1) · Machine learning (1) · LLM reasoning (1)
📖 Key Information
📰 Related News (8)
-
🇺🇸 The Specification Trap: Why Content-Based AI Value Alignment Cannot Produce Robust Alignment
arXiv:2512.03048v2 Announce Type: replace Abstract: I argue that content-based AI value alignment--any approach that treats alignment as optimizing t...
-
🇺🇸 Risk Awareness Injection: Calibrating Vision-Language Models for Safety without Compromising Utility
arXiv:2602.03402v2 Announce Type: replace Abstract: Vision language models (VLMs) extend the reasoning capabilities of large language models (LLMs) t...
-
🇺🇸 When Evaluation Becomes a Side Channel: Regime Leakage and Structural Mitigations for Alignment Assessment
arXiv:2602.08449v1 Announce Type: new Abstract: Safety evaluation for advanced AI systems implicitly assumes that behavior observed under evaluation ...
-
🇺🇸 iGRPO: Self-Feedback-Driven LLM Reasoning
arXiv:2602.09000v1 Announce Type: new Abstract: Large Language Models (LLMs) have shown promise in solving complex mathematical problems, yet they st...
-
🇺🇸 compar:IA: The French Government's LLM arena to collect French-language human prompts and preference data
arXiv:2602.06669v1 Announce Type: cross Abstract: Large Language Models (LLMs) often show reduced performance, cultural alignment, and safety robustn...
-
🇺🇸 TamperBench: Systematically Stress-Testing LLM Safety Under Fine-Tuning and Tampering
arXiv:2602.06911v1 Announce Type: cross Abstract: As increasingly capable open-weight large language models (LLMs) are deployed, improving their tamp...
-
🇺🇸 How does information access affect LLM monitors' ability to detect sabotage?
arXiv:2601.21112v2 Announce Type: replace Abstract: Frontier language model agents can exhibit misaligned behaviors, including deception, exploiting ...
-
🇺🇸 Methods and Open Problems in Differentiable Social Choice: Learning Mechanisms, Decisions, and Alignment
arXiv:2602.03003v2 Announce Type: replace Abstract: Social choice is no longer a peripheral concern of political theory or economics-it has become a ...
🔗 Entity Intersection Graph
People and organizations frequently mentioned alongside AI alignment:
- 🌐 Large language model (2 shared articles)
- 🌐 Reinforcement learning (1 shared articles)
- 🌐 PPO (1 shared articles)
- 🌐 Sleeper agent (1 shared articles)
- 🌐 Situation awareness (1 shared articles)
- 🌐 Reinforcement learning from human feedback (1 shared articles)
- 🌐 Government of France (1 shared articles)