AI alignment
Conformance of AI to intended objectives
📊 Rating
6 news mentions · 👍 0 likes · 👎 0 dislikes
📌 Topics
- Machine Learning (4)
- AI Safety (3)
- Artificial Intelligence (3)
- Cybersecurity (2)
- Technology (1)
- Security (1)
- Mathematics (1)
- Digital Sovereignty (1)
- Linguistics (1)
- Social Choice Theory (1)
🏷️ Keywords
AI alignment (6) · arXiv (3) · Large Language Models (2) · Regime leakage (1) · Situational awareness (1) · Sleeper agents (1) · Safety evaluation (1) · Machine learning (1) · LLM reasoning (1) · Reinforcement Learning (1) · iGRPO (1) · mathematical accuracy (1) · PPO (1) · self-feedback (1) · compar:IA (1) · French government (1) · RLHF (1) · Direct Preference Optimization (1) · Dataset (1) · TamperBench (1)
📖 Key Information
📰 Related News (6)
-
🇺🇸 When Evaluation Becomes a Side Channel: Regime Leakage and Structural Mitigations for Alignment Assessment
arXiv:2602.08449v1 Announce Type: new Abstract: Safety evaluation for advanced AI systems implicitly assumes that behavior observed under evaluation ...
-
🇺🇸 iGRPO: Self-Feedback-Driven LLM Reasoning
arXiv:2602.09000v1 Announce Type: new Abstract: Large Language Models (LLMs) have shown promise in solving complex mathematical problems, yet they st...
-
🇺🇸 compar:IA: The French Government's LLM arena to collect French-language human prompts and preference data
arXiv:2602.06669v1 Announce Type: cross Abstract: Large Language Models (LLMs) often show reduced performance, cultural alignment, and safety robustn...
-
🇺🇸 TamperBench: Systematically Stress-Testing LLM Safety Under Fine-Tuning and Tampering
arXiv:2602.06911v1 Announce Type: cross Abstract: As increasingly capable open-weight large language models (LLMs) are deployed, improving their tamp...
-
🇺🇸 How does information access affect LLM monitors' ability to detect sabotage?
arXiv:2601.21112v2 Announce Type: replace Abstract: Frontier language model agents can exhibit misaligned behaviors, including deception, exploiting ...
-
🇺🇸 Methods and Open Problems in Differentiable Social Choice: Learning Mechanisms, Decisions, and Alignment
arXiv:2602.03003v2 Announce Type: replace Abstract: Social choice is no longer a peripheral concern of political theory or economics-it has become a ...
🔗 Entity Intersection Graph
People and organizations frequently mentioned alongside AI alignment:
- 🌐 Large language model (2 shared articles)
- 🌐 Reinforcement learning (1 shared articles)
- 🌐 PPO (1 shared articles)
- 🌐 Sleeper agent (1 shared articles)
- 🌐 Situation awareness (1 shared articles)
- 🌐 Reinforcement learning from human feedback (1 shared articles)
- 🌐 Government of France (1 shared articles)