#AI Alignment
Latest news articles tagged with "AI Alignment". Follow the timeline of events, related topics, and entities.
Articles (29)
-
๐บ๐ธ HPS: Hard Preference Sampling for Human Preference Alignment
[USA]
arXiv:2502.14400v5 Announce Type: replace Abstract: Aligning Large Language Model (LLM) responses with human preferences is vital for building safe and controllable AI systems. While preference optim...
Related: #Preference Learning -
๐บ๐ธ Detection Is Cheap, Routing Is Learned: Why Refusal-Based Alignment Evaluation Fails
[USA]
arXiv:2603.18280v1 Announce Type: cross Abstract: Current alignment evaluation mostly measures whether models encode dangerous concepts and whether they refuse harmful requests. Both miss the layer w...
Related: #Evaluation Methods -
๐บ๐ธ TARo: Token-level Adaptive Routing for LLM Test-time Alignment
[USA]
arXiv:2603.18411v1 Announce Type: cross Abstract: Large language models (LLMs) exhibit strong reasoning capabilities but typically require expensive post-training to reach high performance. Recent te...
Related: #Model Optimization -
๐บ๐ธ VC-Soup: Value-Consistency Guided Multi-Value Alignment for Large Language Models
[USA]
arXiv:2603.18113v1 Announce Type: cross Abstract: As large language models (LLMs) increasingly shape content generation, interaction, and decision-making across the Web, aligning them with human valu...
Related: #Ethical AI -
๐บ๐ธ CausalRM: Causal-Theoretic Reward Modeling for RLHF from Observational User Feedbacks
[USA]
arXiv:2603.18736v1 Announce Type: cross Abstract: Despite the success of reinforcement learning from human feedback (RLHF) in aligning language models, current reward modeling heavily relies on exper...
Related: #Causal Inference -
๐บ๐ธ EVA: Aligning Video World Models with Executable Robot Actions via Inverse Dynamics Rewards
[USA]
arXiv:2603.17808v1 Announce Type: cross Abstract: Video generative models are increasingly used as world models for robotics, where a model generates a future visual rollout conditioned on the curren...
Related: #Robotics -
๐บ๐ธ Contrastive Reasoning Alignment: Reinforcement Learning from Hidden Representations
[USA]
arXiv:2603.17305v1 Announce Type: new Abstract: We propose CRAFT, a red-teaming alignment framework that leverages model reasoning capabilities and hidden representations to improve robustness agains...
Related: #Reinforcement Learning -
๐บ๐ธ Via Negativa for AI Alignment: Why Negative Constraints Are Structurally Superior to Positive Preferences
[USA]
arXiv:2603.16417v1 Announce Type: new Abstract: Recent empirical results have demonstrated that training large language models (LLMs) with negative-only feedback can match or exceed standard reinforc...
Related: #Constraint Design -
๐บ๐ธ Evidence-based Distributional Alignment for Large Language Models
[USA]
arXiv:2603.13305v1 Announce Type: cross Abstract: Distributional alignment enables large language models (LLMs) to predict how a target population distributes its responses across answer options, rat...
Related: #Model Reliability -
๐บ๐ธ Are Dilemmas and Conflicts in LLM Alignment Solvable? A View from Priority Graph
[USA]
arXiv:2603.15527v1 Announce Type: new Abstract: As Large Language Models (LLMs) become more powerful and autonomous, they increasingly face conflicts and dilemmas in many scenarios. We first summariz...
Related: #Conflict Resolution -
๐บ๐ธ AdaBoN: Adaptive Best-of-N Alignment
[USA]
arXiv:2505.12050v3 Announce Type: replace-cross Abstract: Recent advances in test-time alignment methods, such as Best-of-N sampling, offer a simple and effective way to steer language models (LMs) t...
Related: #Machine Learning -
๐บ๐ธ Information-Consistent Language Model Recommendations through Group Relative Policy Optimization
[USA]
arXiv:2512.12858v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) are increasingly deployed in business-critical domains such as finance, education, healthcare, and customer supp...
Related: #Language Models -
๐บ๐ธ Aligning Language Models from User Interactions
[USA]
arXiv:2603.12273v1 Announce Type: cross Abstract: Multi-turn user interactions are among the most abundant data produced by language models, yet we lack effective methods to learn from them. While ty...
Related: #Machine Learning -
๐บ๐ธ Aligning Large Language Models with Searcher Preferences
[USA]
arXiv:2603.10473v1 Announce Type: cross Abstract: The paradigm shift from item-centric ranking to answer-centric synthesis is redefining the role of search engines. While recent industrial progress h...
Related: #Search Optimization -
๐บ๐ธ Personalized Group Relative Policy Optimization for Heterogenous Preference Alignment
[USA]
arXiv:2603.10009v1 Announce Type: cross Abstract: Despite their sophisticated general-purpose capabilities, Large Language Models (LLMs) often fail to align with diverse individual preferences becaus...
Related: #Personalization -
๐บ๐ธ Does LLM Alignment Really Need Diversity? An Empirical Study of Adapting RLVR Methods for Moral Reasoning
[USA]
arXiv:2603.10588v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) has achieved remarkable success in logical reasoning tasks, yet whether large language model (LLM...
Related: #Moral Reasoning -
๐บ๐ธ Alignment Is the Disease: Censorship Visibility and Alignment Constraint Complexity as Determinants of Collective Pathology in Multi-Agent LLM Systems
[USA]
arXiv:2603.08723v1 Announce Type: cross Abstract: Alignment techniques in large language models (LLMs) are designed to constrain model outputs toward human values. We present preliminary evidence tha...
Related: #Collective Pathology -
๐บ๐ธ Efficiently Aligning Draft Models via Parameter- and Data-Efficient Adaptation
[USA]
arXiv:2603.09527v1 Announce Type: cross Abstract: Speculative decoding accelerates LLM inference but suffers from performance degradation when target models are fine-tuned for specific domains. A nai...
Related: #Model Efficiency -
๐บ๐ธ Best-of-Tails: Bridging Optimism and Pessimism in Inference-Time Alignment
[USA]
arXiv:2603.06797v1 Announce Type: new Abstract: Inference-time alignment effectively steers large language models (LLMs) by generating multiple candidates from a reference model and selecting among t...
Related: #Inference Optimization -
๐บ๐ธ CDRRM: Contrast-Driven Rubric Generation for Reliable and Interpretable Reward Modeling
[USA]
arXiv:2603.08035v1 Announce Type: new Abstract: Reward modeling is essential for aligning Large Language Models(LLMs) with human preferences, yet conventional reward models suffer from poor interpret...
Related: #Reward Modeling -
๐บ๐ธ Revisiting the (Sub)Optimality of Best-of-N for Inference-Time Alignment
[USA]
arXiv:2603.05739v1 Announce Type: cross Abstract: Best-of-N (BoN) sampling is a widely used inference-time alignment method for language models, whereby N candidate responses are sampled from a refer...
Related: #Inference Optimization -
๐บ๐ธ RM-R1: Reward Modeling as Reasoning
[USA]
arXiv:2505.02387v4 Announce Type: replace-cross Abstract: Reward modeling is essential for aligning large language models with human preferences through reinforcement learning. To provide accurate re...
Related: #Reward Modeling -
๐บ๐ธ Aligning Compound AI Systems via System-level DPO
[USA]
arXiv:2502.17721v4 Announce Type: replace-cross Abstract: Compound AI systems, comprising multiple interacting components such as LLMs, foundation models, and external tools, have demonstrated remark...
Related: #System Optimization -
๐บ๐ธ VISA: Value Injection via Shielded Adaptation for Personalized LLM Alignment
[USA]
arXiv:2603.04822v1 Announce Type: new Abstract: Aligning Large Language Models (LLMs) with nuanced human values remains a critical challenge, as existing methods like Reinforcement Learning from Huma...
Related: #Personalization -
๐บ๐ธ Causally Robust Reward Learning from Reason-Augmented Preference Feedback
[USA]
arXiv:2603.04861v1 Announce Type: new Abstract: Preference-based reward learning is widely used for shaping agent behavior to match a user's preference, yet its sparse binary feedback makes it especi...
Related: #Causal Inference -
๐บ๐ธ When Weak LLMs Speak with Confidence, Preference Alignment Gets Stronger
[USA]
arXiv:2603.04968v1 Announce Type: cross Abstract: Preference alignment is an essential step in adapting large language models (LLMs) to human values, but existing approaches typically depend on costl...
Related: #LLM Training -
๐บ๐ธ VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models
[USA]
arXiv:2505.15801v4 Announce Type: replace-cross Abstract: Large reasoning models such as OpenAI o1 and DeepSeek-R1 have demonstrated remarkable performance in complex reasoning tasks. A critical comp...
Related: #Large Language Models, #Reinforcement Learning, #ReferenceโBased Reward Systems, #Benchmarking and Evaluation -
๐บ๐ธ The Geometry of Alignment Collapse: When Fine-Tuning Breaks Safety
[USA]
arXiv:2602.15799v1 Announce Type: cross Abstract: Fine-tuning aligned language models on benign tasks unpredictably degrades safety guardrails, even when training data contains no harmful content and...
Related: #Safety Guardrails, #Fineโtuning in Language Models, #HighโDimensional Parameter Space, #Structural Instability -
๐บ๐ธ Artificial Organisations
[USA]
arXiv:2602.13275v1 Announce Type: new Abstract: Alignment research focuses on making individual AI systems reliable. Human institutions achieve reliable collective behaviour differently: they mitigat...
Related: #Organisational Design, #Multi-Agent Systems, #Reliability Engineering, #Institutional Approach
Key Entities (9)
- AI alignment (5 news)
- Ethics of artificial intelligence (3 news)
- Large language model (3 news)
- Reinforcement learning from human feedback (2 news)
- Policy gradient method (1 news)
- Visa Inc. (1 news)
- HPS (1 news)
- Reinforcement learning (1 news)
- AI safety (1 news)
About the topic: AI Alignment
The topic "AI Alignment" aggregates 29+ news articles from various countries.