#AI Alignment

Latest news articles tagged with "AI Alignment". Follow the timeline of events, related topics, and entities.

Articles (29)

🇺🇸 HPS: Hard Preference Sampling for Human Preference Alignment — 23/03/2026 [USA]
arXiv:2502.14400v5 Announce Type: replace Abstract: Aligning Large Language Model (LLM) responses with human preferences is vital for building safe and controllable AI systems. While preference optim...
Related: #Preference Learning
🇺🇸 Detection Is Cheap, Routing Is Learned: Why Refusal-Based Alignment Evaluation Fails — 20/03/2026 [USA]
arXiv:2603.18280v1 Announce Type: cross Abstract: Current alignment evaluation mostly measures whether models encode dangerous concepts and whether they refuse harmful requests. Both miss the layer w...
Related: #Evaluation Methods
🇺🇸 TARo: Token-level Adaptive Routing for LLM Test-time Alignment — 20/03/2026 [USA]
arXiv:2603.18411v1 Announce Type: cross Abstract: Large language models (LLMs) exhibit strong reasoning capabilities but typically require expensive post-training to reach high performance. Recent te...
Related: #Model Optimization
🇺🇸 VC-Soup: Value-Consistency Guided Multi-Value Alignment for Large Language Models — 20/03/2026 [USA]
arXiv:2603.18113v1 Announce Type: cross Abstract: As large language models (LLMs) increasingly shape content generation, interaction, and decision-making across the Web, aligning them with human valu...
Related: #Ethical AI
🇺🇸 CausalRM: Causal-Theoretic Reward Modeling for RLHF from Observational User Feedbacks — 20/03/2026 [USA]
arXiv:2603.18736v1 Announce Type: cross Abstract: Despite the success of reinforcement learning from human feedback (RLHF) in aligning language models, current reward modeling heavily relies on exper...
Related: #Causal Inference
🇺🇸 EVA: Aligning Video World Models with Executable Robot Actions via Inverse Dynamics Rewards — 19/03/2026 [USA]
arXiv:2603.17808v1 Announce Type: cross Abstract: Video generative models are increasingly used as world models for robotics, where a model generates a future visual rollout conditioned on the curren...
Related: #Robotics
🇺🇸 Contrastive Reasoning Alignment: Reinforcement Learning from Hidden Representations — 19/03/2026 [USA]
arXiv:2603.17305v1 Announce Type: new Abstract: We propose CRAFT, a red-teaming alignment framework that leverages model reasoning capabilities and hidden representations to improve robustness agains...
Related: #Reinforcement Learning
🇺🇸 Via Negativa for AI Alignment: Why Negative Constraints Are Structurally Superior to Positive Preferences — 18/03/2026 [USA]
arXiv:2603.16417v1 Announce Type: new Abstract: Recent empirical results have demonstrated that training large language models (LLMs) with negative-only feedback can match or exceed standard reinforc...
Related: #Constraint Design
🇺🇸 Evidence-based Distributional Alignment for Large Language Models — 17/03/2026 [USA]
arXiv:2603.13305v1 Announce Type: cross Abstract: Distributional alignment enables large language models (LLMs) to predict how a target population distributes its responses across answer options, rat...
Related: #Model Reliability
🇺🇸 Are Dilemmas and Conflicts in LLM Alignment Solvable? A View from Priority Graph — 17/03/2026 [USA]
arXiv:2603.15527v1 Announce Type: new Abstract: As Large Language Models (LLMs) become more powerful and autonomous, they increasingly face conflicts and dilemmas in many scenarios. We first summariz...
Related: #Conflict Resolution
🇺🇸 AdaBoN: Adaptive Best-of-N Alignment — 16/03/2026 [USA]
arXiv:2505.12050v3 Announce Type: replace-cross Abstract: Recent advances in test-time alignment methods, such as Best-of-N sampling, offer a simple and effective way to steer language models (LMs) t...
Related: #Machine Learning
🇺🇸 Information-Consistent Language Model Recommendations through Group Relative Policy Optimization — 16/03/2026 [USA]
arXiv:2512.12858v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) are increasingly deployed in business-critical domains such as finance, education, healthcare, and customer supp...
Related: #Language Models
🇺🇸 Aligning Language Models from User Interactions — 16/03/2026 [USA]
arXiv:2603.12273v1 Announce Type: cross Abstract: Multi-turn user interactions are among the most abundant data produced by language models, yet we lack effective methods to learn from them. While ty...
Related: #Machine Learning
🇺🇸 Aligning Large Language Models with Searcher Preferences — 12/03/2026 [USA]
arXiv:2603.10473v1 Announce Type: cross Abstract: The paradigm shift from item-centric ranking to answer-centric synthesis is redefining the role of search engines. While recent industrial progress h...
Related: #Search Optimization
🇺🇸 Personalized Group Relative Policy Optimization for Heterogenous Preference Alignment — 12/03/2026 [USA]
arXiv:2603.10009v1 Announce Type: cross Abstract: Despite their sophisticated general-purpose capabilities, Large Language Models (LLMs) often fail to align with diverse individual preferences becaus...
Related: #Personalization
🇺🇸 Does LLM Alignment Really Need Diversity? An Empirical Study of Adapting RLVR Methods for Moral Reasoning — 12/03/2026 [USA]
arXiv:2603.10588v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) has achieved remarkable success in logical reasoning tasks, yet whether large language model (LLM...
Related: #Moral Reasoning
🇺🇸 Alignment Is the Disease: Censorship Visibility and Alignment Constraint Complexity as Determinants of Collective Pathology in Multi-Agent LLM Systems — 11/03/2026 [USA]
arXiv:2603.08723v1 Announce Type: cross Abstract: Alignment techniques in large language models (LLMs) are designed to constrain model outputs toward human values. We present preliminary evidence tha...
Related: #Collective Pathology
🇺🇸 Efficiently Aligning Draft Models via Parameter- and Data-Efficient Adaptation — 11/03/2026 [USA]
arXiv:2603.09527v1 Announce Type: cross Abstract: Speculative decoding accelerates LLM inference but suffers from performance degradation when target models are fine-tuned for specific domains. A nai...
Related: #Model Efficiency
🇺🇸 Best-of-Tails: Bridging Optimism and Pessimism in Inference-Time Alignment — 10/03/2026 [USA]
arXiv:2603.06797v1 Announce Type: new Abstract: Inference-time alignment effectively steers large language models (LLMs) by generating multiple candidates from a reference model and selecting among t...
Related: #Inference Optimization
🇺🇸 CDRRM: Contrast-Driven Rubric Generation for Reliable and Interpretable Reward Modeling — 10/03/2026 [USA]
arXiv:2603.08035v1 Announce Type: new Abstract: Reward modeling is essential for aligning Large Language Models(LLMs) with human preferences, yet conventional reward models suffer from poor interpret...
Related: #Reward Modeling
🇺🇸 Revisiting the (Sub)Optimality of Best-of-N for Inference-Time Alignment — 09/03/2026 [USA]
arXiv:2603.05739v1 Announce Type: cross Abstract: Best-of-N (BoN) sampling is a widely used inference-time alignment method for language models, whereby N candidate responses are sampled from a refer...
Related: #Inference Optimization
🇺🇸 RM-R1: Reward Modeling as Reasoning — 09/03/2026 [USA]
arXiv:2505.02387v4 Announce Type: replace-cross Abstract: Reward modeling is essential for aligning large language models with human preferences through reinforcement learning. To provide accurate re...
Related: #Reward Modeling
🇺🇸 Aligning Compound AI Systems via System-level DPO — 09/03/2026 [USA]
arXiv:2502.17721v4 Announce Type: replace-cross Abstract: Compound AI systems, comprising multiple interacting components such as LLMs, foundation models, and external tools, have demonstrated remark...
Related: #System Optimization
🇺🇸 VISA: Value Injection via Shielded Adaptation for Personalized LLM Alignment — 06/03/2026 [USA]
arXiv:2603.04822v1 Announce Type: new Abstract: Aligning Large Language Models (LLMs) with nuanced human values remains a critical challenge, as existing methods like Reinforcement Learning from Huma...
Related: #Personalization
🇺🇸 Causally Robust Reward Learning from Reason-Augmented Preference Feedback — 06/03/2026 [USA]
arXiv:2603.04861v1 Announce Type: new Abstract: Preference-based reward learning is widely used for shaping agent behavior to match a user's preference, yet its sparse binary feedback makes it especi...
Related: #Causal Inference
🇺🇸 When Weak LLMs Speak with Confidence, Preference Alignment Gets Stronger — 06/03/2026 [USA]
arXiv:2603.04968v1 Announce Type: cross Abstract: Preference alignment is an essential step in adapting large language models (LLMs) to human values, but existing approaches typically depend on costl...
Related: #LLM Training
🇺🇸 VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models — 19/02/2026 [USA]
arXiv:2505.15801v4 Announce Type: replace-cross Abstract: Large reasoning models such as OpenAI o1 and DeepSeek-R1 have demonstrated remarkable performance in complex reasoning tasks. A critical comp...
Related: #Large Language Models, #Reinforcement Learning, #Reference‑Based Reward Systems, #Benchmarking and Evaluation
🇺🇸 The Geometry of Alignment Collapse: When Fine-Tuning Breaks Safety — 18/02/2026 [USA]
arXiv:2602.15799v1 Announce Type: cross Abstract: Fine-tuning aligned language models on benign tasks unpredictably degrades safety guardrails, even when training data contains no harmful content and...
Related: #Safety Guardrails, #Fine‑tuning in Language Models, #High‑Dimensional Parameter Space, #Structural Instability
🇺🇸 Artificial Organisations — 17/02/2026 [USA]
arXiv:2602.13275v1 Announce Type: new Abstract: Alignment research focuses on making individual AI systems reliable. Human institutions achieve reliable collective behaviour differently: they mitigat...
Related: #Organisational Design, #Multi-Agent Systems, #Reliability Engineering, #Institutional Approach

Key Entities (9)

AI alignment (5 news)
Ethics of artificial intelligence (3 news)
Large language model (3 news)
Reinforcement learning from human feedback (2 news)
Policy gradient method (1 news)
Visa Inc. (1 news)
HPS (1 news)
Reinforcement learning (1 news)
AI safety (1 news)

About the topic: AI Alignment

The topic "AI Alignment" aggregates 29+ news articles from various countries.