#AI Reliability
Latest news articles tagged with "AI Reliability". Follow the timeline of events, related topics, and entities.
Articles (17)
-
πΊπΈ Breaking the Chain: A Causal Analysis of LLM Faithfulness to Intermediate Structures
[USA]
arXiv:2603.16475v1 Announce Type: new Abstract: Schema-guided reasoning pipelines ask LLMs to produce explicit intermediate structures -- rubrics, checklists, verification queries -- before committin...
Related: #Causal Analysis -
πΊπΈ Grounding the Score: Explicit Visual Premise Verification for Reliable Vision-Language Process Reward Models
[USA]
arXiv:2603.16253v1 Announce Type: cross Abstract: Vision-language process reward models (VL-PRMs) are increasingly used to score intermediate reasoning steps and rerank candidates under test-time sca...
Related: #Multimodal Verification -
πΊπΈ When Stability Fails: Hidden Failure Modes Of LLMS in Data-Constrained Scientific Decision-Making
[USA]
arXiv:2603.15840v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly used as decision-support tools in data-constrained scientific workflows, where correctness and validity...
Related: #Scientific Research -
πΊπΈ Attribution-Guided Model Rectification of Unreliable Neural Network Behaviors
[USA]
arXiv:2603.15656v1 Announce Type: cross Abstract: The performance of neural network models deteriorates due to their unreliable behavior on non-robust features of corrupted samples. Owing to their op...
Related: #Model Correction -
πΊπΈ Nonstandard Errors in AI Agents
[USA]
arXiv:2603.16744v1 Announce Type: new Abstract: We study whether state-of-the-art AI coding agents, given the same data and research question, produce the same empirical results. Deploying 150 autono...
Related: #Error Analysis -
πΊπΈ Thinking in Uncertainty: Mitigating Hallucinations in MLRMs with Latent Entropy-Aware Decoding
[USA]
arXiv:2603.13366v1 Announce Type: cross Abstract: Recent advancements in multimodal large reasoning models (MLRMs) have significantly improved performance in visual question answering. However, we ob...
Related: #Uncertainty Management -
πΊπΈ Semantic Invariance in Agentic AI
[USA]
arXiv:2603.13173v1 Announce Type: new Abstract: Large Language Models (LLMs) increasingly serve as autonomous reasoning agents in decision support, scientific problem-solving, and multi-agent coordin...
Related: #Natural Language Processing -
πΊπΈ Deployment-Time Reliability of Learned Robot Policies
[USA]
arXiv:2603.11400v1 Announce Type: cross Abstract: Recent advances in learning-based robot manipulation have produced policies with remarkable capabilities. Yet, reliability at deployment remains a fu...
Related: #Robotics -
πΊπΈ Toward Epistemic Stability: Engineering Consistent Procedures for Industrial LLM Hallucination Reduction
[USA]
arXiv:2603.10047v1 Announce Type: cross Abstract: Hallucinations in large language models (LLMs) are outputs that are syntactically coherent but factually incorrect or contextually inconsistent. They...
Related: #Industrial AI -
πΊπΈ Quantifying Hallucinations in Language Language Models on Medical Textbooks
[USA]
arXiv:2603.09986v1 Announce Type: cross Abstract: Hallucinations, the tendency for large language models to provide responses with factually incorrect and unsupported claims, is a serious problem wit...
Related: #Medical AI -
πΊπΈ Consistency-based Abductive Reasoning over Perceptual Errors of Multiple Pre-trained Models in Novel Environments
[USA]
arXiv:2505.19361v5 Announce Type: replace Abstract: The deployment of pre-trained perception models in novel environments often leads to performance degradation due to distributional shifts. Although...
Related: #Error Detection -
πΊπΈ Amazon plans 'deep dive' internal meeting to address AI-related outages
[USA]
Amazon said AI-assisted production changes were partly to blame for recent infrastructure issues.
Related: #Corporate Strategy -
πΊπΈ Make VLM Recognize Visual Hallucination on Cartoon Character Image with Pose Information
[USA]
arXiv:2403.15048v4 Announce Type: replace-cross Abstract: Leveraging large-scale Text-to-Image (TTI) models have become a common technique for generating exemplar or training dataset in the fields of...
Related: #Computer Vision -
πΊπΈ On the Reliability of AI Methods in Drug Discovery: Evaluation of Boltz-2 for Structure and Binding Affinity Prediction
[USA]
arXiv:2603.05532v1 Announce Type: cross Abstract: Despite continuing hype about the role of AI in drug discovery, no "AI-discovered drugs" have so far received regulatory approval. Here we assess one...
Related: #Drug Discovery, #Protein Prediction -
πΊπΈ Induced Numerical Instability: Hidden Costs in Multimodal Large Language Models
[USA]
arXiv:2603.04453v1 Announce Type: cross Abstract: The use of multimodal large language models has become widespread, and as such the study of these models and their failure points has become of utmos...
Related: #Computational Costs -
πΊπΈ Agent Behavioral Contracts: Formal Specification and Runtime Enforcement for Reliable Autonomous AI Agents
[USA]
arXiv:2602.22302v1 Announce Type: new Abstract: Traditional software relies on contracts -- APIs, type systems, assertions -- to specify and enforce correct behavior. AI agents, by contrast, operate ...
Related: #Formal Specification, #Runtime Enforcement -
πΊπΈ Oracular Programming: A Modular Foundation for Building LLM-Enabled Software
[USA]
arXiv:2502.05310v4 Announce Type: replace-cross Abstract: Large Language Models can solve a wide range of tasks from just a few examples, but they remain difficult to steer and lack a capability esse...
Related: #Programming Paradigms, #LLM Integration, #Software Architecture
Key Entities (6)
- Amazon (1 news)
- Abductive reasoning (1 news)
- AI agent (1 news)
- AI safety (1 news)
- Formal specification (1 news)
- Drug discovery (1 news)
About the topic: AI Reliability
The topic "AI Reliability" aggregates 17+ news articles from various countries.