#AI Reliability

Latest news articles tagged with "AI Reliability". Follow the timeline of events, related topics, and entities.

Articles (17)

🇺🇸 Breaking the Chain: A Causal Analysis of LLM Faithfulness to Intermediate Structures — 18/03/2026 [USA]
arXiv:2603.16475v1 Announce Type: new Abstract: Schema-guided reasoning pipelines ask LLMs to produce explicit intermediate structures -- rubrics, checklists, verification queries -- before committin...
Related: #Causal Analysis
🇺🇸 Grounding the Score: Explicit Visual Premise Verification for Reliable Vision-Language Process Reward Models — 18/03/2026 [USA]
arXiv:2603.16253v1 Announce Type: cross Abstract: Vision-language process reward models (VL-PRMs) are increasingly used to score intermediate reasoning steps and rerank candidates under test-time sca...
Related: #Multimodal Verification
🇺🇸 When Stability Fails: Hidden Failure Modes Of LLMS in Data-Constrained Scientific Decision-Making — 18/03/2026 [USA]
arXiv:2603.15840v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly used as decision-support tools in data-constrained scientific workflows, where correctness and validity...
Related: #Scientific Research
🇺🇸 Attribution-Guided Model Rectification of Unreliable Neural Network Behaviors — 18/03/2026 [USA]
arXiv:2603.15656v1 Announce Type: cross Abstract: The performance of neural network models deteriorates due to their unreliable behavior on non-robust features of corrupted samples. Owing to their op...
Related: #Model Correction
🇺🇸 Nonstandard Errors in AI Agents — 18/03/2026 [USA]
arXiv:2603.16744v1 Announce Type: new Abstract: We study whether state-of-the-art AI coding agents, given the same data and research question, produce the same empirical results. Deploying 150 autono...
Related: #Error Analysis
🇺🇸 Thinking in Uncertainty: Mitigating Hallucinations in MLRMs with Latent Entropy-Aware Decoding — 17/03/2026 [USA]
arXiv:2603.13366v1 Announce Type: cross Abstract: Recent advancements in multimodal large reasoning models (MLRMs) have significantly improved performance in visual question answering. However, we ob...
Related: #Uncertainty Management
🇺🇸 Semantic Invariance in Agentic AI — 16/03/2026 [USA]
arXiv:2603.13173v1 Announce Type: new Abstract: Large Language Models (LLMs) increasingly serve as autonomous reasoning agents in decision support, scientific problem-solving, and multi-agent coordin...
Related: #Natural Language Processing
🇺🇸 Deployment-Time Reliability of Learned Robot Policies — 13/03/2026 [USA]
arXiv:2603.11400v1 Announce Type: cross Abstract: Recent advances in learning-based robot manipulation have produced policies with remarkable capabilities. Yet, reliability at deployment remains a fu...
Related: #Robotics
🇺🇸 Toward Epistemic Stability: Engineering Consistent Procedures for Industrial LLM Hallucination Reduction — 12/03/2026 [USA]
arXiv:2603.10047v1 Announce Type: cross Abstract: Hallucinations in large language models (LLMs) are outputs that are syntactically coherent but factually incorrect or contextually inconsistent. They...
Related: #Industrial AI
🇺🇸 Quantifying Hallucinations in Language Language Models on Medical Textbooks — 12/03/2026 [USA]
arXiv:2603.09986v1 Announce Type: cross Abstract: Hallucinations, the tendency for large language models to provide responses with factually incorrect and unsupported claims, is a serious problem wit...
Related: #Medical AI
🇺🇸 Consistency-based Abductive Reasoning over Perceptual Errors of Multiple Pre-trained Models in Novel Environments — 12/03/2026 [USA]
arXiv:2505.19361v5 Announce Type: replace Abstract: The deployment of pre-trained perception models in novel environments often leads to performance degradation due to distributional shifts. Although...
Related: #Error Detection
🇺🇸 Amazon plans 'deep dive' internal meeting to address AI-related outages — 10/03/2026 [USA]
Amazon said AI-assisted production changes were partly to blame for recent infrastructure issues.
Related: #Corporate Strategy
🇺🇸 Make VLM Recognize Visual Hallucination on Cartoon Character Image with Pose Information — 09/03/2026 [USA]
arXiv:2403.15048v4 Announce Type: replace-cross Abstract: Leveraging large-scale Text-to-Image (TTI) models have become a common technique for generating exemplar or training dataset in the fields of...
Related: #Computer Vision
🇺🇸 On the Reliability of AI Methods in Drug Discovery: Evaluation of Boltz-2 for Structure and Binding Affinity Prediction — 09/03/2026 [USA]
arXiv:2603.05532v1 Announce Type: cross Abstract: Despite continuing hype about the role of AI in drug discovery, no "AI-discovered drugs" have so far received regulatory approval. Here we assess one...
Related: #Drug Discovery, #Protein Prediction
🇺🇸 Induced Numerical Instability: Hidden Costs in Multimodal Large Language Models — 06/03/2026 [USA]
arXiv:2603.04453v1 Announce Type: cross Abstract: The use of multimodal large language models has become widespread, and as such the study of these models and their failure points has become of utmos...
Related: #Computational Costs
🇺🇸 Agent Behavioral Contracts: Formal Specification and Runtime Enforcement for Reliable Autonomous AI Agents — 27/02/2026 [USA]
arXiv:2602.22302v1 Announce Type: new Abstract: Traditional software relies on contracts -- APIs, type systems, assertions -- to specify and enforce correct behavior. AI agents, by contrast, operate ...
Related: #Formal Specification, #Runtime Enforcement
🇺🇸 Oracular Programming: A Modular Foundation for Building LLM-Enabled Software — 25/02/2026 [USA]
arXiv:2502.05310v4 Announce Type: replace-cross Abstract: Large Language Models can solve a wide range of tasks from just a few examples, but they remain difficult to steer and lack a capability esse...
Related: #Programming Paradigms, #LLM Integration, #Software Architecture

Key Entities (6)

Amazon (1 news)
Abductive reasoning (1 news)
AI agent (1 news)
AI safety (1 news)
Formal specification (1 news)
Drug discovery (1 news)

About the topic: AI Reliability

The topic "AI Reliability" aggregates 17+ news articles from various countries.