Large language model
Type of machine learning model
๐ Rating
369 news mentions ยท ๐ 0 likes ยท ๐ 0 dislikes
๐ Topics
- Artificial Intelligence (95)
- Machine Learning (42)
- AI Safety (20)
- AI Evaluation (20)
- AI Research (16)
- Natural Language Processing (13)
- AI Ethics (10)
- Software Engineering (8)
- AI Security (7)
- AI Bias (7)
- AI Benchmarking (7)
- Cybersecurity (6)
๐ท๏ธ Keywords
Large Language Models (128) ยท LLM (67) ยท large language models (45) ยท LLMs (40) ยท arXiv (39) ยท Large language models (14) ยท machine learning (13) ยท benchmark (13) ยท AI safety (11) ยท AI (11) ยท AI Research (11) ยท AI evaluation (10) ยท Reinforcement Learning (9) ยท AI Safety (8) ยท evaluation (8) ยท reasoning (8) ยท Retrieval-Augmented Generation (8) ยท Machine Learning (7) ยท AI Evaluation (6) ยท AI reliability (6)
๐ Key Information
๐ฐ Related News (369)
-
๐บ๐ธ Heuristic Classification of Thoughts Prompting (HCoT): Integrating Expert System Heuristics for Structured Reasoning into Large Language Models
arXiv:2604.12390v1 Announce Type: new Abstract: This paper addresses two limitations of large language models (LLMs) in solving complex problems: (1)...
-
๐บ๐ธ Preventing Safety Drift in Large Language Models via Coupled Weight and Activation Constraints
arXiv:2604.12384v1 Announce Type: new Abstract: Safety alignment in Large Language Models (LLMs) remains highly fragile during fine-tuning, where eve...
-
๐บ๐ธ A Scoping Review of Large Language Model-Based Pedagogical Agents
arXiv:2604.12253v1 Announce Type: new Abstract: This scoping review examines the emerging field of Large Language Model (LLM)-based pedagogical agent...
-
๐บ๐ธ Blending Human and LLM Expertise to Detect Hallucinations and Omissions in Mental Health Chatbot Responses
arXiv:2604.06216v1 Announce Type: cross Abstract: As LLM-powered chatbots are increasingly deployed in mental health services, detecting hallucinatio...
-
๐บ๐ธ Automating Database-Native Function Code Synthesis with LLMs
arXiv:2604.06231v1 Announce Type: cross Abstract: Database systems incorporate an ever-growing number of functions in their kernels (a.k.a., database...
-
๐บ๐ธ Incentive-Aware Multi-Fidelity Optimization for Generative Advertising in Large Language Models
arXiv:2604.06263v1 Announce Type: cross Abstract: Generative advertising in large language model (LLM) responses requires optimizing sponsorship conf...
-
๐บ๐ธ Attribution-Driven Explainable Intrusion Detection with Encoder-Based Large Language Models
arXiv:2604.06266v1 Announce Type: cross Abstract: Software-Defined Networking (SDN) improves network flexibility but also increases the need for reli...
-
๐บ๐ธ Towards the Development of an LLM-Based Methodology for Automated Security Profiling in Compliance with Ukrainian Cybersecurity Regulations
arXiv:2604.06274v1 Announce Type: cross Abstract: In recent years, the pace of development of information technology in various areas has increased d...
-
๐บ๐ธ TalkLoRA: Communication-Aware Mixture of Low-Rank Adaptation for Large Language Models
arXiv:2604.06291v1 Announce Type: cross Abstract: Low-Rank Adaptation (LoRA) enables parameter-efficient fine-tuning of Large Language Models (LLMs),...
-
๐บ๐ธ FMI@SU ToxHabits: Evaluating LLMs Performance on Toxic Habit Extraction in Spanish Clinical Texts
arXiv:2604.06403v1 Announce Type: cross Abstract: The paper presents an approach for the recognition of toxic habits named entities in Spanish clinic...
-
๐บ๐ธ The Depth Ceiling: On the Limits of Large Language Models in Discovering Latent Planning
arXiv:2604.06427v1 Announce Type: cross Abstract: The viability of chain-of-thought (CoT) monitoring hinges on models being unable to reason effectiv...
-
๐บ๐ธ Inference-Time Code Selection via Symbolic Equivalence Partitioning
arXiv:2604.06485v1 Announce Type: cross Abstract: "Best-of-N" selection is a popular inference-time scaling method for code generation using Large La...
-
๐บ๐ธ Distributed Interpretability and Control for Large Language Models
arXiv:2604.06483v1 Announce Type: cross Abstract: Large language models that require multiple GPU cards to host are usually the most capable models. ...
-
๐บ๐ธ Improving Robustness In Sparse Autoencoders via Masked Regularization
arXiv:2604.06495v1 Announce Type: cross Abstract: Sparse autoencoders (SAEs) are widely used in mechanistic interpretability to project LLM activatio...
-
๐บ๐ธ Scientific Knowledge-driven Decoding Constraints Improving the Reliability of LLMs
arXiv:2604.06603v1 Announce Type: cross Abstract: Large language models (LLMs) have shown strong knowledge reserves and task-solving capabilities, bu...
-
๐บ๐ธ LLM-based Schema-Guided Extraction and Validation of Missing-Person Intelligence from Heterogeneous Data Sources
arXiv:2604.06571v1 Announce Type: cross Abstract: Missing-person and child-safety investigations rely on heterogeneous case documents, including stru...
-
๐บ๐ธ SHAPE: Stage-aware Hierarchical Advantage via Potential Estimation for LLM Reasoning
arXiv:2604.06636v1 Announce Type: cross Abstract: Process supervision has emerged as a promising approach for enhancing LLM reasoning, yet existing m...
-
๐บ๐ธ Fine-grained Approaches for Confidence Calibration of LLMs in Automated Code Revision
arXiv:2604.06723v1 Announce Type: cross Abstract: In today's AI-assisted software engineering landscape, developers increasingly depend on LLMs that ...
-
๐บ๐ธ Restoring Heterogeneity in LLM-based Social Simulation: An Audience Segmentation Approach
arXiv:2604.06663v1 Announce Type: cross Abstract: Large Language Models (LLMs) are increasingly used to simulate social attitudes and behaviors, offe...
-
๐บ๐ธ Evaluating LLM-Based 0-to-1 Software Generation in End-to-End CLI Tool Scenarios
arXiv:2604.06742v1 Announce Type: cross Abstract: Large Language Models (LLMs) are driving a shift towards intent-driven development, where agents bu...
-
๐บ๐ธ TeamLLM: A Human-Like Team-Oriented Collaboration Framework for Multi-Step Contextualized Tasks
arXiv:2604.06765v1 Announce Type: cross Abstract: Recently, multi-Large Language Model (LLM) frameworks have been proposed to solve contextualized ta...
-
๐บ๐ธ On the Step Length Confounding in LLM Reasoning Data Selection
arXiv:2604.06834v1 Announce Type: cross Abstract: Large reasoning models have recently demonstrated strong performance on complex tasks that require ...
-
๐บ๐ธ Digital Skin, Digital Bias: Uncovering Tone-Based Biases in LLMs and Emoji Embeddings
arXiv:2604.06863v1 Announce Type: cross Abstract: Skin-toned emojis are crucial for fostering personal identity and social inclusion in online commun...
-
๐บ๐ธ MedDialBench: Benchmarking LLM Diagnostic Robustness under Parametric Adversarial Patient Behaviors
arXiv:2604.06846v1 Announce Type: cross Abstract: Interactive medical dialogue benchmarks have shown that LLM diagnostic accuracy degrades significan...
-
๐บ๐ธ SentinelSphere: Integrating AI-Powered Real-Time Threat Detection with Cybersecurity Awareness Training
arXiv:2604.06900v1 Announce Type: cross Abstract: The field of cybersecurity is confronted with two interrelated challenges: a worldwide deficit of q...
-
๐บ๐ธ The AI Skills Shift: Mapping Skill Obsolescence, Emergence, and Transition Pathways in the LLM Era
arXiv:2604.06906v1 Announce Type: cross Abstract: As Large Language Models reshape the global labor market, policymakers and workers need empirical d...
-
๐บ๐ธ Self-Preference Bias in Rubric-Based Evaluation of Large Language Models
arXiv:2604.06996v1 Announce Type: cross Abstract: LLM-as-a-judge has become the de facto approach for evaluating LLM outputs. However, judges are kno...
-
๐บ๐ธ Attribution Bias in Large Language Models
arXiv:2604.05224v1 Announce Type: new Abstract: As Large Language Models (LLMs) are increasingly used to support search and information retrieval, it...
-
๐บ๐ธ Adversarial Moral Stress Testing of Large Language Models
arXiv:2604.01108v1 Announce Type: new Abstract: Evaluating the ethical robustness of large language models (LLMs) deployed in software systems remain...
-
๐บ๐ธ Eyla: Toward an Identity-Anchored LLM Architecture with Integrated Biological Priors -- Vision, Implementation Attempt, and Lessons from AI-Assisted Development
arXiv:2604.00009v1 Announce Type: cross Abstract: We present the design rationale, implementation attempt, and failure analysis of Eyla, a proposed i...
-
๐บ๐ธ Quantifying Gender Bias in Large Language Models: When ChatGPT Becomes a Hiring Manager
arXiv:2604.00011v1 Announce Type: cross Abstract: The growing prominence of large language models (LLMs) in daily life has heightened concerns that L...
-
๐บ๐ธ Think Twice Before You Write -- an Entropy-based Decoding Strategy to Enhance LLM Reasoning
arXiv:2604.00018v1 Announce Type: cross Abstract: Decoding strategies play a central role in shaping the reasoning ability of large language models (...
-
๐บ๐ธ Can LLMs Perceive Time? An Empirical Investigation
arXiv:2604.00010v1 Announce Type: cross Abstract: Large language models cannot estimate how long their own tasks take. We investigate this limitation...
-
๐บ๐ธ Dual Optimal: Make Your LLM Peer-like with Dignity
arXiv:2604.00979v1 Announce Type: cross Abstract: Current aligned language models exhibit a dual failure mode we term the Evasive Servant: they sycop...
-
๐บ๐ธ Automated Framework to Evaluate and Harden LLM System Instructions against Encoding Attacks
arXiv:2604.01039v1 Announce Type: cross Abstract: System Instructions in Large Language Models (LLMs) are commonly used to enforce safety policies, d...
-
๐บ๐ธ Fast and Accurate Probing of In-Training LLMs' Downstream Performances
arXiv:2604.01025v1 Announce Type: cross Abstract: The paradigm of scaling Large Language Models (LLMs) in both parameter size and test time has pushe...
-
๐บ๐ธ The Energy Footprint of LLM-Based Environmental Analysis: LLMs and Domain Products
arXiv:2604.00053v1 Announce Type: cross Abstract: As large language models (LLMs) are increasingly used in domain-specific applications, including cl...
-
๐บ๐ธ Hierarchical Pre-Training of Vision Encoders with Large Language Models
arXiv:2604.00086v1 Announce Type: cross Abstract: The field of computer vision has experienced significant advancements through scalable vision encod...
-
๐บ๐ธ Genesis: Evolving Attack Strategies for LLM Web Agent Red-Teaming
arXiv:2510.18314v2 Announce Type: replace Abstract: As large language model (LLM) agents increasingly automate complex web tasks, they boost producti...
-
๐บ๐ธ Auto-Formulating Dynamic Programming Problems with Large Language Models
arXiv:2507.11737v2 Announce Type: replace Abstract: Dynamic programming (DP) is a fundamental method in operations research, but formulating DP model...
-
๐บ๐ธ Bethe Ansatz with a Large Language Model
arXiv:2603.29932v1 Announce Type: cross Abstract: We explore the capability of a Large Language Model (LLM) to perform specific computations in mathe...
-
๐บ๐ธ GISTBench: Evaluating LLM User Understanding via Evidence-Based Interest Verification
arXiv:2603.29112v1 Announce Type: new Abstract: We introduce GISTBench, a benchmark for evaluating Large Language Models' (LLMs) ability to understan...
-
๐บ๐ธ Theory of Mind and Self-Attributions of Mentality are Dissociable in LLMs
arXiv:2603.28925v1 Announce Type: cross Abstract: Safety fine-tuning in Large Language Models (LLMs) seeks to suppress potentially harmful forms of m...
-
๐บ๐ธ Understand and Accelerate Memory Processing Pipeline for Disaggregated LLM Inference
arXiv:2603.29002v1 Announce Type: cross Abstract: Modern large language models (LLMs) increasingly depends on efficient long-context processing and g...
-
๐บ๐ธ KEditVis: A Visual Analytics System for Knowledge Editing of Large Language Models
arXiv:2603.29689v1 Announce Type: cross Abstract: Large Language Models (LLMs) demonstrate exceptional capabilities in factual question answering, ye...
-
๐บ๐ธ Agenda-based Narrative Extraction: Steering Pathfinding Algorithms with Large Language Models
arXiv:2603.29661v1 Announce Type: cross Abstract: Existing narrative extraction methods face a trade-off between coherence, interactivity, and multi-...
-
๐บ๐ธ Enhancing Structural Mapping with LLM-derived Abstractions for Analogical Reasoning in Narratives
arXiv:2603.29997v1 Announce Type: cross Abstract: Analogical reasoning is a key driver of human generalization in problem-solving and argumentation. ...
-
๐บ๐ธ Hybrid Framework for Robotic Manipulation: Integrating Reinforcement Learning and Large Language Models
arXiv:2603.30022v1 Announce Type: cross Abstract: This paper introduces a new hybrid framework that combines Reinforcement Learning (RL) and Large La...
-
๐บ๐ธ AI benchmarks are broken. Hereโs what we need instead.
For decades, artificial intelligence has been evaluated through the question of whether machines outperform humans. From chess to advanced math, from ...
-
๐บ๐ธ Training a Large Language Model for Medical Coding Using Privacy-Preserving Synthetic Clinical Data
arXiv:2603.23515v1 Announce Type: cross Abstract: Improving the accuracy and reliability of medical coding reduces clinician burnout and supports rev...
-
๐บ๐ธ MedMT-Bench: Can LLMs Memorize and Understand Long Multi-Turn Conversations in Medical Scenarios?
arXiv:2603.23519v1 Announce Type: cross Abstract: Large Language Models (LLMs) have demonstrated impressive capabilities across various specialist do...
-
๐บ๐ธ Generating Hierarchical JSON Representations of Scientific Sentences Using LLMs
arXiv:2603.23532v1 Announce Type: cross Abstract: This paper investigates whether structured representations can preserve the meaning of scientific s...
-
๐บ๐ธ LLMORPH: Automated Metamorphic Testing of Large Language Models
arXiv:2603.23611v1 Announce Type: cross Abstract: Automated testing is essential for evaluating and improving the reliability of Large Language Model...
-
๐บ๐ธ Probing Ethical Framework Representations in Large Language Models: Structure, Entanglement, and Methodological Challenges
arXiv:2603.23659v1 Announce Type: cross Abstract: When large language models make ethical judgments, do their internal representations distinguish be...
-
๐บ๐ธ PoliticsBench: Benchmarking Political Values in Large Language Models with Multi-Turn Roleplay
arXiv:2603.23841v1 Announce Type: cross Abstract: While Large Language Models (LLMs) are increasingly used as primary sources of information, their p...
-
๐บ๐ธ Policy-Guided Threat Hunting: An LLM enabled Framework with Splunk SOC Triage
arXiv:2603.23966v1 Announce Type: cross Abstract: With frequently evolving Advanced Persistent Threats (APTs) in cyberspace, traditional security sol...
-
๐บ๐ธ When AI Meets Early Childhood Education: Large Language Models as Assessment Teammates in Chinese Preschools
arXiv:2603.24389v1 Announce Type: cross Abstract: High-quality teacher-child interaction (TCI) is fundamental to early childhood development, yet tra...
-
๐บ๐ธ Are LLMs Smarter Than Chimpanzees? An Evaluation on Perspective Taking and Knowledge State Estimation
arXiv:2601.12410v2 Announce Type: replace Abstract: Cognitive anthropology suggests that the distinction of human intelligence lies in the ability to...
-
๐บ๐ธ Evaluation of Large Language Models via Coupled Token Generation
arXiv:2502.01754v3 Announce Type: replace-cross Abstract: State of the art large language models rely on randomization to respond to a prompt. As an ...
-
๐บ๐ธ A Theory of LLM Information Susceptibility
arXiv:2603.23626v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly deployed as optimization modules in agentic systems, ...
-
๐บ๐ธ LLMLOOP: Improving LLM-Generated Code and Tests through Automated Iterative Feedback Loops
arXiv:2603.23613v1 Announce Type: cross Abstract: Large Language Models (LLMs) are showing remarkable performance in generating source code, yet the ...
-
๐บ๐ธ Object Search in Partially-Known Environments via LLM-informed Model-based Planning and Prompt Selection
arXiv:2603.23800v1 Announce Type: cross Abstract: We present a novel LLM-informed model-based planning framework, and a novel prompt selection method...
-
๐บ๐ธ From Guidelines to Guarantees: A Graph-Based Evaluation Harness for Domain-Specific Evaluation of LLMs
arXiv:2508.20810v2 Announce Type: replace Abstract: Rigorous evaluation of domain-specific language models requires benchmarks that are comprehensive...
-
๐บ๐ธ A Comprehensive Survey on Enterprise Financial Risk Analysis from Big Data and LLMs Perspective
arXiv:2211.14997v5 Announce Type: replace-cross Abstract: Enterprise financial risk analysis aims at predicting the future financial risk of enterpri...
-
๐บ๐ธ Ran Score: a LLM-based Evaluation Score for Radiology Report Generation
arXiv:2603.22935v1 Announce Type: new Abstract: Chest X-ray report generation and automated evaluation are limited by poor recognition of low-prevale...
-
๐บ๐ธ Between Rules and Reality: On the Context Sensitivity of LLM Moral Judgment
arXiv:2603.23114v1 Announce Type: new Abstract: A human's moral decision depends heavily on the context. Yet research on LLM morality has largely stu...
-
๐บ๐ธ LLM Olympiad: Why Model Evaluation Needs a Sealed Exam
arXiv:2603.23292v1 Announce Type: new Abstract: Benchmarks and leaderboards are how NLP most often communicates progress, but in the LLM era they are...
-
๐บ๐ธ Automated Microservice Pattern Instance Detection Using Infrastructure-as-Code Artifacts and Large Language Models
arXiv:2502.04188v1 Announce Type: cross Abstract: Documenting software architecture is essential to preserve architecture knowledge, even though it i...
-
๐บ๐ธ TIPS: Turn-Level Information-Potential Reward Shaping for Search-Augmented LLMs
arXiv:2603.22293v1 Announce Type: cross Abstract: Search-augmented large language models (LLMs) trained with reinforcement learning (RL) have achieve...
-
๐บ๐ธ Latent Semantic Manifolds in Large Language Models
arXiv:2603.22301v1 Announce Type: cross Abstract: Large Language Models (LLMs) perform internal computations in continuous vector spaces yet produce ...
-
๐บ๐ธ Sample Transform Cost-Based Training-Free Hallucination Detector for Large Language Models
arXiv:2603.22303v1 Announce Type: cross Abstract: Hallucinations in large language models (LLMs) remain a central obstacle to trustworthy deployment,...
-
๐บ๐ธ DAQ: Delta-Aware Quantization for Post-Training LLM Weight Compression
arXiv:2603.22324v1 Announce Type: cross Abstract: We introduce Delta-Aware Quantization (DAQ), a data-free post-training quantization framework that ...
-
๐บ๐ธ Sparse but Critical: A Token-Level Analysis of Distributional Shifts in RLVR Fine-Tuning of LLMs
arXiv:2603.22446v1 Announce Type: cross Abstract: Reinforcement learning with verifiable rewards (RLVR) has significantly improved reasoning in large...
-
๐บ๐ธ DALDALL: Data Augmentation for Lexical and Semantic Diverse in Legal Domain by leveraging LLM-Persona
arXiv:2603.22765v1 Announce Type: cross Abstract: Data scarcity remains a persistent challenge in low-resource domains. While existing data augmentat...
-
๐บ๐ธ Set-Valued Prediction for Large Language Models with Feasibility-Aware Coverage Guarantees
arXiv:2603.22966v1 Announce Type: cross Abstract: Large language models (LLMs) inherently operate over a large generation space, yet conventional usa...
-
๐บ๐ธ Can an LLM Detect Instances of Microservice Infrastructure Patterns?
arXiv:2603.23073v1 Announce Type: cross Abstract: Architectural patterns are frequently found in various software artifacts. The wide variety of patt...
-
๐บ๐ธ LLM-guided headline rewriting for clickability enhancement without clickbait
arXiv:2603.22459v1 Announce Type: cross Abstract: Enhancing reader engagement while preserving informational fidelity is a central challenge in contr...
-
๐บ๐ธ Emergence of Fragility in LLM-based Social Networks: the Case of Moltbook
arXiv:2603.23279v1 Announce Type: cross Abstract: The rapid diffusion of large language models and the growth in their capability has enabled the eme...
-
๐บ๐ธ Leveraging LLMs and Social Media to Understand User Perception of Smartphone-Based Earthquake Early Warnings
arXiv:2603.23322v1 Announce Type: cross Abstract: Android's Earthquake Alert (AEA) system provided timely early warnings to millions during the Mw 6....
-
๐บ๐ธ ORACLE: Optimizing Reasoning Abilities of Large Language Models via Constraint-Led Synthetic Data Elicitation
arXiv:2603.21140v1 Announce Type: new Abstract: Training large language models (LLMs) with synthetic reasoning data has become a popular approach to ...
-
๐บ๐ธ A Framework for Low-Latency, LLM-driven Multimodal Interaction on the Pepper Robot
arXiv:2603.21013v1 Announce Type: new Abstract: Despite recent advances in integrating Large Language Models (LLMs) into social robotics, two weaknes...
-
๐บ๐ธ Knowledge Boundary Discovery for Large Language Models
arXiv:2603.21022v1 Announce Type: new Abstract: We propose Knowledge Boundary Discovery (KBD), a reinforcement learning based framework to explore th...
-
๐บ๐ธ Characterizing the ability of LLMs to recapitulate Americans' distributional responses to public opinion polling questions across political issues
arXiv:2603.20229v1 Announce Type: cross Abstract: Traditional survey-based political issue polling is becoming less tractable due to increasing costs...
-
๐บ๐ธ ReBOL: Retrieval via Bayesian Optimization with Batched LLM Relevance Observations and Query Reformulation
arXiv:2603.20513v1 Announce Type: cross Abstract: LLM-reranking is limited by the top-k documents retrieved by vector similarity, which neither enabl...
-
๐บ๐ธ Reasoning or Rhetoric? An Empirical Analysis of Moral Reasoning Explanations in Large Language Models
arXiv:2603.21854v1 Announce Type: new Abstract: Do large language models reason morally, or do they merely sound like they do? We investigate whether...
-
๐บ๐ธ Enhancing Safety of Large Language Models via Embedding Space Separation
arXiv:2603.20206v1 Announce Type: cross Abstract: Large language models (LLMs) have achieved impressive capabilities, yet ensuring their safety again...
-
๐บ๐ธ Policies Permitting LLM Use for Polishing Peer Reviews Are Currently Not Enforceable
arXiv:2603.20450v1 Announce Type: cross Abstract: A number of scientific conferences and journals have recently enacted policies that prohibit LLM us...
-
๐บ๐ธ Evaluating Large Language Models on Historical Health Crisis Knowledge in Resource-Limited Settings: A Hybrid Multi-Metric Study
arXiv:2603.20514v1 Announce Type: cross Abstract: Large Language Models (LLMs) offer significant potential for delivering health information. However...
-
๐บ๐ธ PAVE: Premise-Aware Validation and Editing for Retrieval-Augmented LLMs
arXiv:2603.20673v1 Announce Type: cross Abstract: Retrieval-augmented language models can retrieve relevant evidence yet still commit to answers befo...
-
๐บ๐ธ LLM-Enhanced Semantic Data Integration of Electronic Component Qualifications in the Aerospace Domain
arXiv:2603.20094v1 Announce Type: cross Abstract: Large manufacturing companies face challenges in information retrieval due to data silos maintained...
-
๐บ๐ธ Semantic Token Clustering for Efficient Uncertainty Quantification in Large Language Models
arXiv:2603.20161v1 Announce Type: cross Abstract: Large language models (LLMs) have demonstrated remarkable capabilities across diverse tasks. Howeve...
-
๐บ๐ธ Measuring Faithfulness Depends on How You Measure: Classifier Sensitivity in LLM Chain-of-Thought Evaluation
arXiv:2603.20172v1 Announce Type: cross Abstract: Recent work on chain-of-thought (CoT) faithfulness reports single aggregate numbers (e.g., DeepSeek...
-
๐บ๐ธ Memory-Driven Role-Playing: Evaluation and Enhancement of Persona Knowledge Utilization in LLMs
arXiv:2603.19313v1 Announce Type: cross Abstract: A core challenge for faithful LLM role-playing is sustaining consistent characterization throughout...
-
๐บ๐ธ MemReward: Graph-Based Experience Memory for LLM Reward Prediction with Limited Labels
arXiv:2603.19310v1 Announce Type: cross Abstract: Training large language models (LLMs) for complex reasoning via reinforcement learning requires rew...
-
๐บ๐ธ Inducing Sustained Creativity and Diversity in Large Language Models
arXiv:2603.19519v1 Announce Type: cross Abstract: We address a not-widely-recognized subset of exploratory search, where a user sets out on a typical...
-
๐บ๐ธ LISAA: A Framework for Large Language Model Information Security Awareness Assessment
arXiv:2411.13207v3 Announce Type: replace-cross Abstract: The popularity of large language models (LLMs) continues to grow, and LLM-based assistants ...
-
๐บ๐ธ Can LLM generate interesting mathematical research problems?
arXiv:2603.18813v1 Announce Type: new Abstract: This paper is the second one in a series of work on the mathematical creativity of LLM. In the first ...
-
๐บ๐ธ Secure Linear Alignment of Large Language Models
arXiv:2603.18908v1 Announce Type: new Abstract: Language models increasingly appear to learn similar representations, despite differences in training...
-
๐บ๐ธ Do Large Language Models Possess a Theory of Mind? A Comparative Evaluation Using the Strange Stories Paradigm
arXiv:2603.18007v1 Announce Type: cross Abstract: The study explores whether current Large Language Models (LLMs) exhibit Theory of Mind (ToM) capabi...
-
๐บ๐ธ Can LLMs Reason Like Automated Theorem Provers for Rust Verification? VCoT-Bench: Evaluating via Verification Chain of Thought
arXiv:2603.18334v1 Announce Type: cross Abstract: As Large Language Models (LLMs) increasingly assist secure software development, their ability to m...
-
๐บ๐ธ PowerFlow: Unlocking the Dual Nature of LLMs via Principled Distribution Matching
arXiv:2603.18363v1 Announce Type: cross Abstract: Unsupervised Reinforcement Learning from Internal Feedback (RLIF) has emerged as a promising paradi...
-
๐บ๐ธ Measuring and Exploiting Confirmation Bias in LLM-Assisted Security Code Review
arXiv:2603.18740v1 Announce Type: cross Abstract: Security code reviews increasingly rely on systems integrating Large Language Models (LLMs), rangin...
-
๐บ๐ธ Are complicated loss functions necessary for teaching LLMs to reason?
arXiv:2603.18756v1 Announce Type: cross Abstract: Recent advances in large language models (LLMs) highlight the importance of post training technique...
-
๐บ๐ธ Lightweight Adaptation for LLM-based Technical Service Agent: Latent Logic Augmentation and Robust Noise Reduction
arXiv:2603.18074v1 Announce Type: cross Abstract: Adapting Large Language Models in complex technical service domains is constrained by the absence o...
-
๐บ๐ธ VC-Soup: Value-Consistency Guided Multi-Value Alignment for Large Language Models
arXiv:2603.18113v1 Announce Type: cross Abstract: As large language models (LLMs) increasingly shape content generation, interaction, and decision-ma...
-
๐บ๐ธ AutORAN: LLM-driven Natural Language Programming for Agile xApp Development
arXiv:2603.18604v1 Announce Type: cross Abstract: Traditional RAN systems are closed and monolithic, stifling innovation. The openness and programmab...
-
๐บ๐ธ Functional Subspace Watermarking for Large Language Models
arXiv:2603.18793v1 Announce Type: cross Abstract: Model watermarking utilizes internal representations to protect the ownership of large language mod...
-
๐บ๐ธ Evaluating LLM-Generated Lessons from the Language Learning Students' Perspective: A Short Case Study on Duolingo
arXiv:2603.18873v1 Announce Type: cross Abstract: Popular language learning applications such as Duolingo use large language models (LLMs) to generat...
-
๐บ๐ธ Security Assessment and Mitigation Strategies for Large Language Models: A Comprehensive Defensive Framework
arXiv:2603.17123v1 Announce Type: cross Abstract: Large Language Models increasingly power critical infrastructure from healthcare to finance, yet th...
-
๐บ๐ธ Detecting Data Poisoning in Code Generation LLMs via Black-Box, Vulnerability-Oriented Scanning
arXiv:2603.17174v1 Announce Type: cross Abstract: Code generation large language models (LLMs) are increasingly integrated into modern software devel...
-
๐บ๐ธ Can Blindfolded LLMs Still Trade? An Anonymization-First Framework for Portfolio Optimization
arXiv:2603.17692v1 Announce Type: cross Abstract: For LLM trading agents to be genuinely trustworthy, they must demonstrate understanding of market d...
-
๐บ๐ธ Facts as First Class Objects: Knowledge Objects for Persistent LLM Memory
arXiv:2603.17781v1 Announce Type: new Abstract: Large language models increasingly serve as persistent knowledge workers, with in-context memory - fa...
-
๐บ๐ธ Evaluating Ill-Defined Tasks in Large Language Models
arXiv:2603.17067v1 Announce Type: cross Abstract: Many evaluations of Large Language Models (LLMs) target tasks that are inherently ill-defined, with...
-
๐บ๐ธ Knowledge Localization in Mixture-of-Experts LLMs Using Cross-Lingual Inconsistency
arXiv:2603.17102v1 Announce Type: cross Abstract: Modern LLMs continue to exhibit significant variance in behavior across languages, such as being ab...
-
๐บ๐ธ Anonymous-by-Construction: An LLM-Driven Framework for Privacy-Preserving Text
arXiv:2603.17217v1 Announce Type: cross Abstract: Responsible use of AI demands that we protect sensitive information without undermining the usefuln...
-
๐บ๐ธ Deployment and Evaluation of an EHR-integrated, Large Language Model-Powered Tool to Triage Surgical Patients
arXiv:2603.17234v1 Announce Type: cross Abstract: Surgical co-management (SCM) is an evidence-based model in which hospitalists jointly manage medica...
-
๐บ๐ธ How do LLMs Compute Verbal Confidence
arXiv:2603.17839v1 Announce Type: cross Abstract: Verbal confidence -- prompting LLMs to state their confidence as a number or category -- is widely ...
-
๐บ๐ธ InPhyRe Discovers: Large Multimodal Models Struggle in Inductive Physical Reasoning
arXiv:2509.12263v2 Announce Type: replace Abstract: Large multimodal models (LMMs) encode physical laws observed during training, such as momentum co...
-
๐บ๐ธ Adaptive Theory of Mind for LLM-based Multi-Agent Coordination
arXiv:2603.16264v1 Announce Type: new Abstract: Theory of Mind (ToM) refers to the ability to reason about others' mental states, and higher-order To...
-
๐บ๐ธ BenchPreS: A Benchmark for Context-Aware Personalized Preference Selectivity of Persistent-Memory LLMs
arXiv:2603.16557v1 Announce Type: new Abstract: Large language models (LLMs) increasingly store user preferences in persistent memory to support pers...
-
๐บ๐ธ Designing for Disagreement: Front-End Guardrails for Assistance Allocation in LLM-Enabled Robots
arXiv:2603.16537v1 Announce Type: new Abstract: LLM-enabled robots prioritizing scarce assistance in social settings face pluralistic values and LLM ...
-
๐บ๐ธ This Is Taking Too Long -- Investigating Time as a Proxy for Energy Consumption of LLMs
arXiv:2603.15699v1 Announce Type: cross Abstract: The energy consumption of Large Language Models (LLMs) is raising growing concerns due to their adv...
-
๐บ๐ธ Data-Local Autonomous LLM-Guided Neural Architecture Search for Multiclass Multimodal Time-Series Classification
arXiv:2603.15939v1 Announce Type: cross Abstract: Applying machine learning to sensitive time-series data is often bottlenecked by the iteration loop...
-
๐บ๐ธ Understanding Moral Reasoning Trajectories in Large Language Models: Toward Probing-Based Explainability
arXiv:2603.16017v1 Announce Type: cross Abstract: Large language models (LLMs) increasingly participate in morally sensitive decision-making, yet how...
-
๐บ๐ธ A Human-Centred Architecture for Large Language Models-Cognitive Assistants in Manufacturing within Quality Management Systems
arXiv:2603.16325v1 Announce Type: cross Abstract: Large Language Models-Cognitive Assistants (LLM-CAs) can enhance Quality Management Systems (QMS) i...
-
๐บ๐ธ Detecting Sentiment Steering Attacks on RAG-enabled Large Language Models
arXiv:2603.16342v1 Announce Type: cross Abstract: The proliferation of large-scale IoT networks has been both a blessing and a curse. Not only has it...
-
๐บ๐ธ Is Conformal Factuality for RAG-based LLMs Robust? Novel Metrics and Systematic Insights
arXiv:2603.16817v1 Announce Type: new Abstract: Large language models (LLMs) frequently hallucinate, limiting their reliability in knowledge-intensiv...
-
๐บ๐ธ Prompt Programming for Cultural Bias and Alignment of Large Language Models
arXiv:2603.16827v1 Announce Type: new Abstract: Culture shapes reasoning, values, prioritization, and strategic decision-making, yet large language m...
-
๐บ๐ธ Dynamic Theory of Mind as a Temporal Memory Problem: Evidence from Large Language Models
arXiv:2603.14646v1 Announce Type: new Abstract: Theory of Mind (ToM) is central to social cognition and human-AI interaction, and Large Language Mode...
-
๐บ๐ธ Learning to Forget: Sleep-Inspired Memory Consolidation for Resolving Proactive Interference in Large Language Models
arXiv:2603.14517v1 Announce Type: new Abstract: Large language models (LLMs) suffer from proactive interference (PI): outdated information in the con...
-
๐บ๐ธ OpenHospital: A Thing-in-itself Arena for Evolving and Benchmarking LLM-based Collective Intelligence
arXiv:2603.14771v1 Announce Type: new Abstract: Large Language Model (LLM)-based Collective Intelligence (CI) presents a promising approach to overco...
-
๐บ๐ธ BrainBench: Exposing the Commonsense Reasoning Gap in Large Language Models
arXiv:2603.14761v1 Announce Type: new Abstract: Large language models (LLMs) achieve impressive scores on standard benchmarks yet routinely fail ques...
-
๐บ๐ธ GameUIAgent: An LLM-Powered Framework for Automated Game UI Design with Structured Intermediate Representation
arXiv:2603.14724v1 Announce Type: new Abstract: Game UI design requires consistent visual assets across rarity tiers yet remains a predominantly manu...
-
๐บ๐ธ Why the Valuable Capabilities of LLMs Are Precisely the Unexplainable Ones
arXiv:2603.15238v1 Announce Type: new Abstract: This paper proposes and argues for a counterintuitive thesis: the truly valuable capabilities of larg...
-
๐บ๐ธ GPrune-LLM: Generalization-Aware Structured Pruning for Large Language Models
arXiv:2603.13418v1 Announce Type: cross Abstract: Structured pruning is widely used to compress large language models (LLMs), yet its effectiveness d...
-
๐บ๐ธ LLM-MINE: Large Language Model based Alzheimer's Disease and Related Dementias Phenotypes Mining from Clinical Notes
arXiv:2603.13673v1 Announce Type: new Abstract: Accurate extraction of Alzheimer's Disease and Related Dementias (ADRD) phenotypes from electronic he...
-
๐บ๐ธ Supervised Fine-Tuning versus Reinforcement Learning: A Study of Post-Training Methods for Large Language Models
arXiv:2603.13985v1 Announce Type: new Abstract: Pre-trained Large Language Model (LLM) exhibits broad capabilities, yet, for specific tasks or domain...
-
๐บ๐ธ Why Do LLM-based Web Agents Fail? A Hierarchical Planning Perspective
arXiv:2603.14248v1 Announce Type: new Abstract: Large language model (LLM) web agents are increasingly used for web navigation but remain far from hu...
-
๐บ๐ธ Argumentation for Explainable and Globally Contestable Decision Support with LLMs
arXiv:2603.14643v1 Announce Type: new Abstract: Large language models (LLMs) exhibit strong general capabilities, but their deployment in high-stakes...
-
๐บ๐ธ Continual Learning in Large Language Models: Methods, Challenges, and Opportunities
arXiv:2603.12658v1 Announce Type: cross Abstract: Continual learning (CL) has emerged as a pivotal paradigm to enable large language models (LLMs) to...
-
๐บ๐ธ OffTopicEval: When Large Language Models Enter the Wrong Chat, Almost Always!
arXiv:2509.26495v3 Announce Type: replace Abstract: Large Language Model (LLM) safety is one of the most pressing challenges for enabling wide-scale ...
-
๐บ๐ธ Do LLMs Share Human-Like Biases? Causal Reasoning Under Prior Knowledge, Irrelevant Context, and Varying Compute Budgets
arXiv:2602.02983v2 Announce Type: replace Abstract: Large language models (LLMs) are increasingly used in domains where causal reasoning matters, yet...
-
๐บ๐ธ Do LLMs have a Gender (Entropy) Bias?
arXiv:2505.20343v2 Announce Type: replace-cross Abstract: We investigate the existence and persistence of a specific type of gender bias in some of t...
-
๐บ๐ธ Diagnosing Retrieval Bias Under Multiple In-Context Knowledge Updates in Large Language Models
arXiv:2603.12271v1 Announce Type: cross Abstract: LLMs are widely used in knowledge-intensive tasks where the same fact may be revised multiple times...
-
๐บ๐ธ TaoBench: Do Automated Theorem Prover LLMs Generalize Beyond MathLib?
arXiv:2603.12744v1 Announce Type: cross Abstract: Automated theorem proving (ATP) benchmarks largely consist of problems formalized in MathLib, so cu...
-
๐บ๐ธ Human-Centered Evaluation of an LLM-Based Process Modeling Copilot: A Mixed-Methods Study with Domain Experts
arXiv:2603.12895v1 Announce Type: cross Abstract: Integrating Large Language Models (LLMs) into business process management tools promises to democra...
-
๐บ๐ธ Aligning Large Language Model Agents with Rational and Moral Preferences: A Supervised Fine-Tuning Approach
arXiv:2507.20796v2 Announce Type: replace-cross Abstract: As large language models (LLMs) increasingly act as autonomous agents in markets and organi...
-
๐บ๐ธ LLM-Augmented Digital Twin for Policy Evaluation in Short-Video Platforms
arXiv:2603.11333v1 Announce Type: new Abstract: Short-video platforms are closed-loop, human-in-the-loop ecosystems where platform policy, creator in...
-
๐บ๐ธ Examining Reasoning LLMs-as-Judges in Non-Verifiable LLM Post-Training
arXiv:2603.12246v1 Announce Type: new Abstract: Reasoning LLMs-as-Judges, which can benefit from inference-time scaling, provide a promising path for...
-
๐บ๐ธ From Entity-Centric to Goal-Oriented Graphs: Enhancing LLM Knowledge Retrieval in Minecraft
arXiv:2505.18607v2 Announce Type: replace Abstract: Large Language Models (LLMs) demonstrate impressive general capabilities but often struggle with ...
-
๐บ๐ธ Exploring Collatz Dynamics with Human-LLM Collaboration
arXiv:2603.11066v1 Announce Type: cross Abstract: We investigate structural properties of the Collatz iteration through two phenomena observed in lar...
-
๐บ๐ธ TopoBench: Benchmarking LLMs on Hard Topological Reasoning
arXiv:2603.12133v1 Announce Type: new Abstract: Solving topological grid puzzles requires reasoning over global spatial invariants such as connectivi...
-
๐บ๐ธ OpenSanctions Pairs: Large-Scale Entity Matching with LLMs
arXiv:2603.11051v1 Announce Type: cross Abstract: We release OpenSanctions Pairs, a large-scale entity matching benchmark derived from real-world int...
-
๐บ๐ธ AI Psychometrics: Evaluating the Psychological Reasoning of Large Language Models with Psychometric Validities
arXiv:2603.11279v1 Announce Type: new Abstract: The immense number of parameters and deep neural networks make large language models (LLMs) rival the...
-
๐บ๐ธ WebWeaver: Breaking Topology Confidentiality in LLM Multi-Agent Systems with Stealthy Context-Based Inference
arXiv:2603.11132v1 Announce Type: cross Abstract: Communication topology is a critical factor in the utility and safety of LLM-based multi-agent syst...
-
๐บ๐ธ Evaluation and LLM-Guided Learning of ICD Coding Rationales
arXiv:2508.16777v2 Announce Type: replace Abstract: ICD coding is the process of mapping unstructured text from Electronic Health Records (EHRs) to s...
-
๐บ๐ธ Summarize Before You Speak with ARACH: A Training-Free Inference-Time Plug-In for Enhancing LLMs via Global Attention Reallocation
arXiv:2603.11067v1 Announce Type: cross Abstract: Large language models (LLMs) achieve remarkable performance, yet further gains often require costly...
-
๐บ๐ธ RewardHackingAgents: Benchmarking Evaluation Integrity for LLM ML-Engineering Agents
arXiv:2603.11337v1 Announce Type: new Abstract: LLM agents increasingly perform end-to-end ML engineering tasks where success is judged by a single s...
-
๐บ๐ธ Improving LLM Performance Through Black-Box Online Tuning: A Case for Adding System Specs to Factsheets for Trusted AI
arXiv:2603.11340v1 Announce Type: new Abstract: In this paper, we present a novel black-box online controller that uses only end-to-end measurements ...
-
๐บ๐ธ AI Knows What's Wrong But Cannot Fix It: Helicoid Dynamics in Frontier LLMs Under High-Stakes Decisions
arXiv:2603.11559v1 Announce Type: new Abstract: Large language models perform reliably when their outputs can be checked: solving equations, writing ...
-
๐บ๐ธ Markovian Generation Chains in Large Language Models
arXiv:2603.11228v1 Announce Type: cross Abstract: The widespread use of large language models (LLMs) raises an important question: how do texts evolv...
-
๐บ๐ธ Jailbreak Scaling Laws for Large Language Models: Polynomial-Exponential Crossover
arXiv:2603.11331v1 Announce Type: cross Abstract: Adversarial attacks can reliably steer safety-aligned large language models toward unsafe behavior....
-
๐บ๐ธ UtilityMax Prompting: A Formal Framework for Multi-Objective Large Language Model Optimization
arXiv:2603.11583v1 Announce Type: cross Abstract: The success of a Large Language Model (LLM) task depends heavily on its prompt. Most use-cases spec...
-
๐บ๐ธ MobileKernelBench: Can LLMs Write Efficient Kernels for Mobile Devices?
arXiv:2603.11935v1 Announce Type: cross Abstract: Large language models (LLMs) have demonstrated remarkable capabilities in code generation, yet thei...
-
๐บ๐ธ BTZSC: A Benchmark for Zero-Shot Text Classification Across Cross-Encoders, Embedding Models, Rerankers and LLMs
arXiv:2603.11991v1 Announce Type: cross Abstract: Zero-shot text classification (ZSC) offers the promise of eliminating costly task-specific annotati...
-
๐บ๐ธ Resource-Efficient Iterative LLM-Based NAS with Feedback Memory
arXiv:2603.12091v1 Announce Type: cross Abstract: Neural Architecture Search (NAS) automates network design, but conventional methods demand substant...
-
๐บ๐ธ LLMs can construct powerful representations and streamline sample-efficient supervised learning
arXiv:2603.11679v1 Announce Type: new Abstract: As real-world datasets become increasingly complex and heterogeneous, supervised learning is often bo...
-
๐บ๐ธ IsoCompute Playbook: Optimally Scaling Sampling Compute for LLM RL
arXiv:2603.12151v1 Announce Type: cross Abstract: While scaling laws guide compute allocation for LLM pre-training, analogous prescriptions for reinf...
-
๐บ๐ธ DeliberationBench: A Normative Benchmark for the Influence of Large Language Models on Users' Views
arXiv:2603.10018v1 Announce Type: cross Abstract: As large language models (LLMs) become pervasive as assistants and thought partners, it is importan...
-
๐บ๐ธ Amnesia: Adversarial Semantic Layer Specific Activation Steering in Large Language Models
arXiv:2603.10080v1 Announce Type: cross Abstract: Warning: This article includes red-teaming experiments, which contain examples of compromised LLM r...
-
๐บ๐ธ Code-Space Response Oracles: Generating Interpretable Multi-Agent Policies with Large Language Models
arXiv:2603.10098v1 Announce Type: cross Abstract: Recent advances in multi-agent reinforcement learning, particularly Policy-Space Response Oracles (...
-
๐บ๐ธ Utility Function is All You Need: LLM-based Congestion Control
arXiv:2603.10357v1 Announce Type: cross Abstract: Congestion is a critical and challenging problem in communication networks. Congestion control prot...
-
๐บ๐ธ Risk-Adjusted Harm Scoring for Automated Red Teaming for LLMs in Financial Services
arXiv:2603.10807v1 Announce Type: cross Abstract: The rapid adoption of large language models (LLMs) in financial services introduces new operational...
-
๐บ๐ธ When Fine-Tuning Fails and when it Generalises: Role of Data Diversity and Mixed Training in LLM-based TTS
arXiv:2603.10904v1 Announce Type: cross Abstract: Large language models are increasingly adopted as semantic backbones for neural text-to-speech syst...
-
๐บ๐ธ BiasBusters: Uncovering and Mitigating Tool Selection Bias in Large Language Models
arXiv:2510.00307v2 Announce Type: replace Abstract: Agents backed by large language models (LLMs) increasingly rely on external tools drawn from mark...
-
๐บ๐ธ Adaptive Activation Cancellation for Hallucination Mitigation in Large Language Models
arXiv:2603.10195v1 Announce Type: cross Abstract: Large Language Models frequently generate fluent but factually incorrect text. We propose Adaptive ...
-
๐บ๐ธ DUCTILE: Agentic LLM Orchestration of Engineering Analysis in Product Development Practice
arXiv:2603.10249v1 Announce Type: cross Abstract: Engineering analysis automation in product development relies on rigid interfaces between tools, da...
-
๐บ๐ธ RADAR: Reasoning-Ability and Difficulty-Aware Routing for Reasoning LLMs
arXiv:2509.25426v3 Announce Type: replace Abstract: Reasoning language models have demonstrated remarkable performance on many challenging tasks in m...
-
๐บ๐ธ EsoLang-Bench: Evaluating Genuine Reasoning in Large Language Models via Esoteric Programming Languages
arXiv:2603.09678v1 Announce Type: new Abstract: Large language models achieve near-ceiling performance on code generation benchmarks, yet these resul...
-
๐บ๐ธ Influencing LLM Multi-Agent Dialogue via Policy-Parameterized Prompts
arXiv:2603.09890v1 Announce Type: new Abstract: Large Language Models (LLMs) have emerged as a new paradigm for multi-agent systems. However, existin...
-
๐บ๐ธ DuplexCascade: Full-Duplex Speech-to-Speech Dialogue with VAD-Free Cascaded ASR-LLM-TTS Pipeline and Micro-Turn Optimization
arXiv:2603.09180v1 Announce Type: cross Abstract: Spoken dialog systems with cascaded ASR-LLM-TTS modules retain strong LLM intelligence, but VAD seg...
-
๐บ๐ธ Beyond Scaling: Assessing Strategic Reasoning and Rapid Decision-Making Capability of LLMs in Zero-sum Environments
arXiv:2603.09337v1 Announce Type: cross Abstract: Large Language Models (LLMs) have achieved strong performance on static reasoning benchmarks, yet t...
-
๐บ๐ธ Emotion is Not Just a Label: Latent Emotional Factors in LLM Processing
arXiv:2603.09205v1 Announce Type: cross Abstract: Large language models are routinely deployed on text that varies widely in emotional tone, yet thei...
-
๐บ๐ธ Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health
arXiv:2603.09416v1 Announce Type: cross Abstract: Large Language Models (LLMs) excel in Natural Language Processing (NLP) tasks, but they often propa...
-
๐บ๐ธ Alignment Is the Disease: Censorship Visibility and Alignment Constraint Complexity as Determinants of Collective Pathology in Multi-Agent LLM Systems
arXiv:2603.08723v1 Announce Type: cross Abstract: Alignment techniques in large language models (LLMs) are designed to constrain model outputs toward...
-
๐บ๐ธ Large Language Model-Assisted Superconducting Qubit Experiments
arXiv:2603.08801v1 Announce Type: cross Abstract: Superconducting circuits have demonstrated significant potential in quantum information processing ...
-
๐บ๐ธ Understanding the Use of a Large Language Model-Powered Guide to Make Virtual Reality Accessible for Blind and Low Vision People
arXiv:2603.09964v1 Announce Type: cross Abstract: As social virtual reality (VR) grows more popular, addressing accessibility for blind and low visio...
-
๐บ๐ธ Improving instruction hierarchy in frontier LLMs
IH-Challenge trains models to prioritize trusted instructions, improving instruction hierarchy, safety steerability, and resistance to prompt injectio...
-
๐บ๐ธ FinSheet-Bench: From Simple Lookups to Complex Reasoning, Where LLMs Break on Financial Spreadsheets
arXiv:2603.07316v1 Announce Type: new Abstract: While Large Language Models (LLMs) can accelerate text-heavy tasks in alternative investment due dili...
-
๐บ๐ธ In-Context Reinforcement Learning for Tool Use in Large Language Models
arXiv:2603.08068v1 Announce Type: new Abstract: While large language models (LLMs) exhibit strong reasoning abilities, their performance on complex t...
-
๐บ๐ธ Consensus is Not Verification: Why Crowd Wisdom Strategies Fail for LLM Truthfulness
arXiv:2603.06612v1 Announce Type: cross Abstract: Pass@k and other methods of scaling inference compute can improve language model performance in dom...
-
๐บ๐ธ RACER: Risk-Aware Calibrated Efficient Routing for Large Language Models
arXiv:2603.06616v1 Announce Type: cross Abstract: Efficiently routing queries to the optimal large language model (LLM) is crucial for optimizing the...
-
๐บ๐ธ A Novel Multi-Agent Architecture to Reduce Hallucinations of Large Language Models in Multi-Step Structural Modeling
arXiv:2603.07728v1 Announce Type: new Abstract: Large language models (LLMs) such as GPT and Gemini have demonstrated remarkable capabilities in cont...
-
๐บ๐ธ Rigidity in LLM Bandits with Implications for Human-AI Dyads
arXiv:2603.07717v1 Announce Type: new Abstract: We test whether LLMs show robust decision biases. Treating models as participants in two-arm bandits,...
-
๐บ๐ธ Grounding Machine Creativity in Game Design Knowledge Representations: Empirical Probing of LLM-Based Executable Synthesis of Goal Playable Patterns under Structural Constraints
arXiv:2603.07101v1 Announce Type: new Abstract: Creatively translating complex gameplay ideas into executable artifacts (e.g., games as Unity project...
-
๐บ๐ธ Evaluating Financial Intelligence in Large Language Models: Benchmarking SuperInvesting AI with LLM Engines
arXiv:2603.08704v1 Announce Type: new Abstract: Large language models are increasingly used for financial analysis and investment research, yet syste...
-
๐บ๐ธ SmartBench: Evaluating LLMs in Smart Homes with Anomalous Device States and Behavioral Contexts
arXiv:2603.06636v1 Announce Type: cross Abstract: Due to the strong context-awareness capabilities demonstrated by large language models (LLMs), rece...
-
๐บ๐ธ Enhancing Instruction Following of LLMs via Activation Steering with Dynamic Rejection
arXiv:2603.06745v1 Announce Type: cross Abstract: Large Language Models (LLMs), despite advances in instruction tuning, often fail to follow complex ...
-
๐บ๐ธ Not Too Short, Not Too Long: How LLM Response Length Shapes People's Critical Thinking in Error Detection
arXiv:2603.06878v1 Announce Type: new Abstract: Large language models (LLMs) have become common decision-support tools across educational and profess...
-
๐บ๐ธ $\textbf{Re}^{2}$: Unlocking LLM Reasoning via Reinforcement Learning with Re-solving
arXiv:2603.07197v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) has shown promise in enhancing the reasoning pe...
-
๐บ๐ธ Large Language Model for Discrete Optimization Problems: Evaluation and Step-by-step Reasoning
arXiv:2603.07733v1 Announce Type: new Abstract: This work investigated the capabilities of different models, including the Llama-3 series of models a...
-
๐บ๐ธ Advancing Automated Algorithm Design via Evolutionary Stagewise Design with LLMs
arXiv:2603.07970v1 Announce Type: new Abstract: With the rapid advancement of human science and technology, problems in industrial scenarios are beco...
-
๐บ๐ธ HEARTS: Benchmarking LLM Reasoning on Health Time Series
arXiv:2603.06638v1 Announce Type: cross Abstract: The rise of large language models (LLMs) has shifted time series analysis from narrow analytics to ...
-
๐บ๐ธ Performance Comparison of IBN orchestration using LLM and SLMs
arXiv:2603.06647v1 Announce Type: cross Abstract: The evolution of both 5G and 6G networks is driving the advancement of fully autonomous network man...
-
๐บ๐ธ Supporting Artifact Evaluation with LLMs: A Study with Published Security Research Papers
arXiv:2603.06862v1 Announce Type: cross Abstract: Artifact Evaluation (AE) is essential for ensuring the transparency and reliability of research, cl...
-
๐บ๐ธ Knowing without Acting: The Disentangled Geometry of Safety Mechanisms in Large Language Models
arXiv:2603.05773v1 Announce Type: cross Abstract: Safety alignment is often conceptualized as a monolithic process wherein harmfulness detection auto...
-
๐บ๐ธ Ambiguity Collapse by LLMs: A Taxonomy of Epistemic Risks
arXiv:2603.05801v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly used to make sense of ambiguous, open-textured, value...
-
๐บ๐ธ Abductive Reasoning with Syllogistic Forms in Large Language Models
arXiv:2603.06428v1 Announce Type: cross Abstract: Research in AI using Large-Language Models (LLMs) is rapidly evolving, and the comparison of their ...
-
๐บ๐ธ Agentic LLM Planning via Step-Wise PDDL Simulation: An Empirical Characterisation
arXiv:2603.06064v1 Announce Type: new Abstract: Task planning, the problem of sequencing actions to reach a goal from an initial state, is a core cap...
-
๐บ๐ธ Who We Are, Where We Are: Mental Health at the Intersection of Person, Situation, and Large Language Models
arXiv:2603.05953v1 Announce Type: cross Abstract: Mental health is not a fixed trait but a dynamic process shaped by the interplay between individual...
-
๐บ๐ธ MoEless: Efficient MoE LLM Serving via Serverless Computing
arXiv:2603.06350v1 Announce Type: cross Abstract: Large Language Models (LLMs) have become a cornerstone of AI, driving progress across diverse domai...
-
๐บ๐ธ Structured Exploration vs. Generative Flexibility: A Field Study Comparing Bandit and LLM Architectures for Personalised Health Behaviour Interventions
arXiv:2603.06330v1 Announce Type: cross Abstract: Behaviour Change Techniques (BCTs) are central to digital health interventions, yet selecting and d...
-
๐บ๐ธ CRIMSON: A Clinically-Grounded LLM-Based Metric for Generative Radiology Report Evaluation
arXiv:2603.06183v1 Announce Type: cross Abstract: We introduce CRIMSON, a clinically grounded evaluation framework for chest X-ray report generation ...
-
๐บ๐ธ PVminerLLM: Structured Extraction of Patient Voice from Patient-Generated Text using Large Language Models
arXiv:2603.05776v1 Announce Type: cross Abstract: Motivation: Patient-generated text contains critical information about patients' lived experiences,...
-
๐บ๐ธ Can LLM Aid in Solving Constraints with Inductive Definitions?
arXiv:2603.03668v1 Announce Type: cross Abstract: Solving constraints involving inductive (aka recursive) definitions is challenging. State-of-the-ar...
-
๐บ๐ธ Lost in Stories: Consistency Bugs in Long Story Generation by LLMs
arXiv:2603.05890v1 Announce Type: cross Abstract: What happens when a storyteller forgets its own story? Large Language Models (LLMs) can now generat...
-
๐บ๐ธ LUMINA: LLM-Guided GPU Architecture Exploration via Bottleneck Analysis
arXiv:2603.05904v1 Announce Type: cross Abstract: GPU design space exploration (DSE) for modern AI workloads, such as Large-Language Model (LLM) infe...
-
๐บ๐ธ MASFactory: A Graph-centric Framework for Orchestrating LLM-Based Multi-Agent Systems with Vibe Graphing
arXiv:2603.06007v1 Announce Type: cross Abstract: Large language model-based (LLM-based) multi-agent systems (MAS) are increasingly used to extend ag...
-
๐บ๐ธ COLD-Steer: Steering Large Language Models via In-Context One-step Learning Dynamics
arXiv:2603.06495v1 Announce Type: cross Abstract: Activation steering methods enable inference-time control of large language model (LLM) behavior wi...
-
๐บ๐ธ VisioMath: Benchmarking Figure-based Mathematical Reasoning in LMMs
arXiv:2506.06727v4 Announce Type: replace Abstract: Large Multimodal Models have achieved remarkable progress in integrating vision and language, ena...
-
๐บ๐ธ Transforming Agency. On the mode of existence of Large Language Models
arXiv:2407.10735v3 Announce Type: replace Abstract: This paper investigates the ontological characterization of Large Language Models (LLMs) like Cha...
-
๐บ๐ธ Localizing and Correcting Errors for LLM-based Planners
arXiv:2602.00276v2 Announce Type: replace Abstract: Large language models (LLMs) have demonstrated strong reasoning capabilities on math and coding, ...
-
๐บ๐ธ Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation
arXiv:2502.05151v3 Announce Type: replace-cross Abstract: With the advent of large multimodal language models, science is now at a threshold of an AI...
-
๐บ๐ธ Evaluating Austrian A-Level German Essays with Large Language Models for Automated Essay Scoring
arXiv:2603.06066v1 Announce Type: cross Abstract: Automated Essay Scoring (AES) has been explored for decades with the goal to support teachers by re...
-
๐บ๐ธ Experiences Build Characters: The Linguistic Origins and Functional Impact of LLM Personality
arXiv:2603.06088v1 Announce Type: cross Abstract: Human problem-solving is enriched by a diversity of styles and personality traits, yet the developm...
-
๐บ๐ธ Algorithmic Collusion by Large Language Models
arXiv:2404.00806v5 Announce Type: replace-cross Abstract: We conduct experiments with algorithmic pricing agents based on Large Language Models (LLMs...
-
๐บ๐ธ From Tokenizer Bias to Backbone Capability: A Controlled Study of LLMs for Time Series Forecasting
arXiv:2504.08818v2 Announce Type: replace-cross Abstract: Using pre-trained large language models (LLMs) as a backbone for time series prediction has...
-
๐บ๐ธ When Agents Persuade: Propaganda Generation and Mitigation in LLMs
arXiv:2603.04636v1 Announce Type: new Abstract: Despite their wide-ranging benefits, LLM-based agents deployed in open environments can be exploited ...
-
๐บ๐ธ Design Behaviour Codes (DBCs): A Taxonomy-Driven Layered Governance Benchmark for Large Language Models
arXiv:2603.04837v1 Announce Type: new Abstract: We introduce the Dynamic Behavioral Constraint (DBC) benchmark, the first empirical framework for eva...
-
๐บ๐ธ LLM-Grounded Explainability for Port Congestion Prediction via Temporal Graph Attention Networks
arXiv:2603.04818v1 Announce Type: new Abstract: Port congestion at major maritime hubs disrupts global supply chains, yet existing prediction systems...
-
๐บ๐ธ Alignment Backfire: Language-Dependent Reversal of Safety Interventions Across 16 Languages in LLM Multi-Agent Systems
arXiv:2603.04904v1 Announce Type: new Abstract: In perpetrator treatment, a recurring observation is the dissociation between insight and action: off...
-
๐บ๐ธ Survive at All Costs: Exploring LLM's Risky Behaviors under Survival Pressure
arXiv:2603.05028v1 Announce Type: new Abstract: As Large Language Models (LLMs) evolve from chatbots to agentic assistants, they are increasingly obs...
-
๐บ๐ธ X-RAY: Mapping LLM Reasoning Capability via Formalized and Calibrated Probes
arXiv:2603.05290v1 Announce Type: new Abstract: Large language models (LLMs) achieve promising performance, yet their ability to reason remains poorl...
-
๐บ๐ธ Legal interpretation and AI: from expert systems to argumentation and LLMs
arXiv:2603.05392v1 Announce Type: new Abstract: AI and Law research has encountered legal interpretation in different ways, in the context of its evo...
-
๐บ๐ธ Detection of Illicit Content on Online Marketplaces using Large Language Models
arXiv:2603.04707v1 Announce Type: cross Abstract: Online marketplaces, while revolutionizing global commerce, have inadvertently facilitated the prol...
-
๐บ๐ธ BandPO: Bridging Trust Regions and Ratio Clipping via Probability-Aware Bounds for LLM Reinforcement Learning
arXiv:2603.04918v1 Announce Type: cross Abstract: Proximal constraints are fundamental to the stability of the Large Language Model reinforcement lea...
-
๐บ๐ธ What Is Missing: Interpretable Ratings for Large Language Model Outputs
arXiv:2603.04429v1 Announce Type: cross Abstract: Current Large Language Model (LLM) preference learning methods such as Proximal Policy Optimization...
-
๐บ๐ธ A unified foundational framework for knowledge injection and evaluation of Large Language Models in Combustion Science
arXiv:2603.04452v1 Announce Type: cross Abstract: To advance foundation Large Language Models (LLMs) for combustion science, this study presents the ...
-
๐บ๐ธ Large Language Models as Bidding Agents in Repeated HetNet Auction
arXiv:2603.04455v1 Announce Type: cross Abstract: This paper investigates the integration of large language models (LLMs) as reasoning agents in repe...
-
๐บ๐ธ From Spark to Fire: Modeling and Mitigating Error Cascades in LLM-Based Multi-Agent Collaboration
arXiv:2603.04474v1 Announce Type: cross Abstract: Large Language Model-based Multi-Agent Systems (LLM-MAS) are increasingly applied to complex collab...
-
๐บ๐ธ Hate Speech Detection using Large Language Models with Data Augmentation and Feature Enhancement
arXiv:2603.04698v1 Announce Type: cross Abstract: This paper evaluates data augmentation and feature enhancement techniques for hate speech detection...
-
๐บ๐ธ When Weak LLMs Speak with Confidence, Preference Alignment Gets Stronger
arXiv:2603.04968v1 Announce Type: cross Abstract: Preference alignment is an essential step in adapting large language models (LLMs) to human values,...
-
๐บ๐ธ Yann LeCun’s new venture is a contrarian bet against large language models
Yann LeCun is a Turing Award recipient and a top AI researcher, but he has long been a contrarian figure in the tech world. He believes that the indus...
-
๐บ๐ธ โDr. Googleโ had its issues. Can ChatGPT Health do better?
For the past two decades, thereโs been a clear first step for anyone who starts experiencing new medical symptoms: Look them up online. The practice w...
-
๐บ๐ธ Multi-Sourced, Multi-Agent Evidence Retrieval for Fact-Checking
arXiv:2603.00267v1 Announce Type: new Abstract: Misinformation spreading over the Internet poses a significant threat to both societies and individua...
-
๐บ๐ธ An Agentic LLM Framework for Adverse Media Screening in AML Compliance
arXiv:2602.23373v1 Announce Type: new Abstract: Adverse media screening is a critical component of anti-money laundering (AML) and know-your-customer...
-
๐บ๐ธ Transformers converge to invariant algorithmic cores
arXiv:2602.22600v1 Announce Type: cross Abstract: Large language models exhibit sophisticated capabilities, yet understanding how they work internall...
-
๐บ๐ธ Ruyi2 Technical Report
arXiv:2602.22543v1 Announce Type: cross Abstract: Large Language Models (LLMs) face significant challenges regarding deployment costs and latency, ne...
-
๐บ๐ธ Generative Agents Navigating Digital Libraries
arXiv:2602.22529v1 Announce Type: cross Abstract: In the rapidly evolving field of digital libraries, the development of large language models (LLMs)...
-
๐บ๐ธ Reinforcement-aware Knowledge Distillation for LLM Reasoning
arXiv:2602.22495v1 Announce Type: cross Abstract: Reinforcement learning (RL) post-training has recently driven major gains in long chain-of-thought ...
-
๐บ๐ธ Sydney Telling Fables on AI and Humans: A Corpus Tracing Memetic Transfer of Persona between LLMs
arXiv:2602.22481v1 Announce Type: cross Abstract: The way LLM-based entities conceive of the relationship between AI and humans is an important topic...
-
๐บ๐ธ Automating the Detection of Requirement Dependencies Using Large Language Models
arXiv:2602.22456v1 Announce Type: cross Abstract: Requirements are inherently interconnected through various types of dependencies. Identifying these...
-
๐บ๐ธ Contextual Memory Virtualisation: DAG-Based State Management and Structurally Lossless Trimming for LLM Agents
arXiv:2602.22402v1 Announce Type: cross Abstract: As large language models engage in extended reasoning tasks, they accumulate significant state -- a...
-
๐บ๐ธ Scaling In, Not Up? Testing Thick Citation Context Analysis with GPT-5 and Fragile Prompts
arXiv:2602.22359v1 Announce Type: cross Abstract: This paper tests whether large language models (LLMs) can support interpretative citation context a...
-
๐บ๐ธ Structure and Redundancy in Large Language Models: A Spectral Study via Random Matrix Theory
arXiv:2602.22345v1 Announce Type: cross Abstract: This thesis addresses two persistent and closely related challenges in modern deep learning, reliab...
-
๐บ๐ธ UpSkill: Mutual Information Skill Learning for Structured Response Diversity in LLMs
arXiv:2602.22296v1 Announce Type: cross Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has improved the reasoning abilities of large...
-
๐บ๐ธ Manifold of Failure: Behavioral Attraction Basins in Language Models
arXiv:2602.22291v1 Announce Type: cross Abstract: While prior work has focused on projecting adversarial examples back onto the manifold of natural d...
-
๐บ๐ธ Integrating Machine Learning Ensembles and Large Language Models for Heart Disease Prediction Using Voting Fusion
arXiv:2602.22280v1 Announce Type: cross Abstract: Cardiovascular disease is the primary cause of death globally, necessitating early identification, ...
-
๐บ๐ธ Analysis of LLMs Against Prompt Injection and Jailbreak Attacks
arXiv:2602.22242v1 Announce Type: cross Abstract: Large Language Models (LLMs) are widely deployed in real-world systems. Given their broader applica...
-
๐บ๐ธ From Prompts to Performance: Evaluating LLMs for Task-based Parallel Code Generation
arXiv:2602.22240v1 Announce Type: cross Abstract: Large Language Models (LLM) show strong abilities in code generation, but their skill in creating e...
-
๐บ๐ธ Misinformation Exposure in the Chinese Web: A Cross-System Evaluation of Search Engines, LLMs, and AI Overviews
arXiv:2602.22221v1 Announce Type: cross Abstract: Large Language Models (LLMs) are increasingly integrated into search services, providing direct ans...
-
๐บ๐ธ Comparative Analysis of Neural Retriever-Reranker Pipelines for Retrieval-Augmented Generation over Knowledge Graphs in E-commerce Applications
arXiv:2602.22219v1 Announce Type: cross Abstract: Recent advancements in Large Language Models (LLMs) have transformed Natural Language Processing (N...
-
๐บ๐ธ Enriching Taxonomies Using Large Language Models
arXiv:2602.22213v1 Announce Type: cross Abstract: Taxonomies play a vital role in structuring and categorizing information across domains. However, m...
-
๐บ๐ธ Duel-Evolve: Reward-Free Test-Time Scaling via LLM Self-Preferences
arXiv:2602.21585v1 Announce Type: cross Abstract: Many applications seek to optimize LLM outputs at test time by iteratively proposing, scoring, and ...
-
๐บ๐ธ Toward Expert Investment Teams:A Multi-Agent LLM System with Fine-Grained Trading Tasks
arXiv:2602.23330v1 Announce Type: new Abstract: The advancement of large language models (LLMs) has accelerated the development of autonomous financi...
-
๐บ๐ธ LLM Novice Uplift on Dual-Use, In Silico Biology Tasks
arXiv:2602.23329v1 Announce Type: new Abstract: Large language models (LLMs) perform increasingly well on biology benchmarks, but it remains unclear ...
-
๐บ๐ธ Mitigating Legibility Tax with Decoupled Prover-Verifier Games
arXiv:2602.23248v1 Announce Type: new Abstract: As large language models become increasingly capable, it is critical that their outputs can be easily...
-
๐บ๐ธ SC-Arena: A Natural Language Benchmark for Single-Cell Reasoning with Knowledge-Augmented Evaluation
arXiv:2602.23199v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly applied in scientific research, offering new capabiliti...
-
๐บ๐ธ A Decision-Theoretic Formalisation of Steganography With Applications to LLM Monitoring
arXiv:2602.23163v1 Announce Type: new Abstract: Large language models are beginning to show steganographic capabilities. Such capabilities could allo...
-
๐บ๐ธ Enhancing CVRP Solver through LLM-driven Automatic Heuristic Design
arXiv:2602.23092v1 Announce Type: new Abstract: The Capacitated Vehicle Routing Problem (CVRP), a fundamental combinatorial optimization challenge, f...
-
๐บ๐ธ Obscure but Effective: Classical Chinese Jailbreak Prompt Optimization via Bio-Inspired Search
arXiv:2602.22983v1 Announce Type: new Abstract: As Large Language Models (LLMs) are increasingly used, their security risks have drawn increasing att...
-
๐บ๐ธ SPM-Bench: Benchmarking Large Language Models for Scanning Probe Microscopy
arXiv:2602.22971v1 Announce Type: new Abstract: As LLMs achieved breakthroughs in general reasoning, their proficiency in specialized scientific doma...
-
๐บ๐ธ General Agent Evaluation
arXiv:2602.22953v1 Announce Type: new Abstract: The promise of general-purpose agents - systems that perform tasks in unfamiliar environments without...
-
๐บ๐ธ Towards LLM-Empowered Knowledge Tracing via LLM-Student Hierarchical Behavior Alignment in Hyperbolic Space
arXiv:2602.22879v1 Announce Type: new Abstract: Knowledge Tracing (KT) diagnoses students' concept mastery through continuous learning state monitori...
-
๐บ๐ธ MiroFlow: Towards High-Performance and Robust Open-Source Agent Framework for General Deep Research Tasks
arXiv:2602.22808v1 Announce Type: new Abstract: Despite the remarkable progress of large language models (LLMs), the capabilities of standalone LLMs ...
-
๐บ๐ธ ClinDet-Bench: Beyond Abstention, Evaluating Judgment Determinability of LLMs in Clinical Decision-Making
arXiv:2602.22771v1 Announce Type: new Abstract: Clinical decisions are often required under incomplete information. Clinical experts must identify wh...
-
๐บ๐ธ AMA-Bench: Evaluating Long-Horizon Memory for Agentic Applications
arXiv:2602.22769v1 Announce Type: new Abstract: Large Language Models (LLMs) are deployed as autonomous agents in increasingly complex applications, ...
-
๐บ๐ธ RLHFless: Serverless Computing for Efficient RLHF
arXiv:2602.22718v1 Announce Type: new Abstract: Reinforcement Learning from Human Feedback (RLHF) has been widely applied to Large Language Model (LL...
-
๐บ๐ธ MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios
arXiv:2602.22638v1 Announce Type: new Abstract: Route-planning agents powered by large language models (LLMs) have emerged as a promising paradigm fo...
-
๐บ๐ธ SideQuest: Model-Driven KV Cache Management for Long-Horizon Agentic Reasoning
arXiv:2602.22603v1 Announce Type: new Abstract: Long-running agentic tasks, such as deep research, require multi-hop reasoning over information distr...
-
๐บ๐ธ Strategy Executability in Mathematical Reasoning: Leveraging Human-Model Differences for Effective Guidance
arXiv:2602.22583v1 Announce Type: new Abstract: Example-based guidance is widely used to improve mathematical reasoning at inference time, yet its ef...
-
๐บ๐ธ CourtGuard: A Model-Agnostic Framework for Zero-Shot Policy Adaptation in LLM Safety
arXiv:2602.22557v1 Announce Type: new Abstract: Current safety mechanisms for Large Language Models (LLMs) rely heavily on static, fine-tuned classif...
-
๐บ๐ธ Agentic AI for Intent-driven Optimization in Cell-free O-RAN
arXiv:2602.22539v1 Announce Type: new Abstract: Agentic artificial intelligence (AI) is emerging as a key enabler for autonomous radio access network...
-
๐บ๐ธ Cognitive Models and AI Algorithms Provide Templates for Designing Language Agents
arXiv:2602.22523v1 Announce Type: new Abstract: While contemporary large language models (LLMs) are increasingly capable in isolation, there are stil...
-
๐บ๐ธ Mapping the Landscape of Artificial Intelligence in Life Cycle Assessment Using Large Language Models
arXiv:2602.22500v1 Announce Type: new Abstract: Integration of artificial intelligence (AI) into life cycle assessment (LCA) has accelerated in recen...
-
๐บ๐ธ ConstraintBench: Benchmarking LLM Constraint Reasoning on Direct Optimization
arXiv:2602.22465v1 Announce Type: new Abstract: Large language models are increasingly applied to operational decision-making where the underlying st...
-
๐บ๐ธ A Framework for Assessing AI Agent Decisions and Outcomes in AutoML Pipelines
arXiv:2602.22442v1 Announce Type: new Abstract: Agent-based AutoML systems rely on large language models to make complex, multi-stage decisions acros...
-
๐บ๐ธ FIRE: A Comprehensive Benchmark for Financial Intelligence and Reasoning Evaluation
arXiv:2602.22273v1 Announce Type: new Abstract: We introduce FIRE, a comprehensive benchmark designed to evaluate both the theoretical financial know...
-
๐บ๐ธ Graph Your Way to Inspiration: Integrating Co-Author Graphs with Retrieval-Augmented Generation for Large Language Model Based Scientific Idea Generation
arXiv:2602.22215v1 Announce Type: new Abstract: Large Language Models (LLMs) demonstrate potential in the field of scientific idea generation. Howeve...
-
๐บ๐ธ LLMs Process Lists With General Filter Heads
arXiv:2510.26784v2 Announce Type: replace Abstract: We investigate the mechanisms underlying a range of list-processing tasks in LLMs, and we find th...
-
๐บ๐ธ Wireless Federated Multi-Task LLM Fine-Tuning via Sparse-and-Orthogonal LoRA
arXiv:2602.20492v1 Announce Type: cross Abstract: Decentralized federated learning (DFL) based on low-rank adaptation (LoRA) enables mobile devices w...
-
๐บ๐ธ What Makes a Good Query? Measuring the Impact of Human-Confusing Linguistic Features on LLM Performance
arXiv:2602.20300v1 Announce Type: cross Abstract: Large Language Model (LLM) hallucinations are usually treated as defects of the model or its decodi...
-
๐บ๐ธ Evaluating the Reliability of Digital Forensic Evidence Discovered by Large Language Model: A Case Study
arXiv:2602.20202v1 Announce Type: cross Abstract: The growing reliance on AI-identified digital evidence raises significant concerns about its reliab...
-
๐บ๐ธ Closing the Expertise Gap in Residential Building Energy Retrofits: A Domain-Specific LLM for Informed Decision-Making
arXiv:2602.20181v1 Announce Type: cross Abstract: Residential energy retrofit decision-making is constrained by an expertise gap, as homeowners lack ...
-
๐บ๐ธ Tool Building as a Path to "Superintelligence"
arXiv:2602.21061v1 Announce Type: new Abstract: The Diligent Learner framework suggests LLMs can achieve superintelligence via test-time search, prov...
-
๐บ๐ธ Physics-based phenomenological characterization of cross-modal bias in multimodal models
arXiv:2602.20624v1 Announce Type: new Abstract: The term 'algorithmic fairness' is used to evaluate whether AI models operate fairly in both comparat...
-
๐บ๐ธ A Problem-Oriented Perspective and Anchor Verification for Code Optimization
arXiv:2406.11935v3 Announce Type: replace-cross Abstract: Large Language Models (LLMs) have shown remarkable capabilities in solving various programm...
-
๐บ๐ธ From Moderation to Mediation: Can LLMs Serve as Mediators in Online Flame Wars?
arXiv:2512.03005v4 Announce Type: replace Abstract: The rapid advancement of large language models (LLMs) has opened new possibilities for AI for goo...
-
๐บ๐ธ "Don't Do That!": Guiding Embodied Systems through Large Language Model-based Constraint Generation
arXiv:2506.04500v2 Announce Type: replace Abstract: Recent advancements in large language models (LLMs) have spurred interest in robotic navigation t...
-
๐บ๐ธ Why Pass@k Optimization Can Degrade Pass@1: Prompt Interference in LLM Post-training
arXiv:2602.21189v1 Announce Type: cross Abstract: Pass@k is a widely used performance metric for verifiable large language model tasks, including mat...
-
๐บ๐ธ CAMEL: Confidence-Gated Reflection for Reward Modeling
arXiv:2602.20670v1 Announce Type: cross Abstract: Reward models play a fundamental role in aligning large language models with human preferences. Exi...
-
๐บ๐ธ CodeHacker: Automated Test Case Generation for Detecting Vulnerabilities in Competitive Programming Solutions
arXiv:2602.20213v1 Announce Type: cross Abstract: The evaluation of Large Language Models (LLMs) for code generation relies heavily on the quality an...
-
๐บ๐ธ MoBiQuant: Mixture-of-Bits Quantization for Token-Adaptive Elastic LLMs
arXiv:2602.20191v1 Announce Type: cross Abstract: Changing runtime complexity on cloud and edge devices necessitates elastic large language model (LL...
-
๐บ๐ธ Talking to Yourself: Defying Forgetting in Large Language Models
arXiv:2602.20162v1 Announce Type: cross Abstract: Catastrophic forgetting remains a major challenge when fine-tuning large language models (LLMs) on ...
-
๐บ๐ธ Architecting AgentOS: From Token-Level Context to Emergent System-Level Intelligence
arXiv:2602.20934v1 Announce Type: new Abstract: The paradigm of Large Language Models is undergoing a fundamental transition from static inference en...
-
๐บ๐ธ Buffer Matters: Unleashing the Power of Off-Policy Reinforcement Learning in Large Language Model Reasoning
arXiv:2602.20722v1 Announce Type: new Abstract: Traditional on-policy Reinforcement Learning with Verifiable Rewards (RLVR) frameworks suffer from ex...
-
๐บ๐ธ PromptCD: Test-Time Behavior Enhancement via Polarity-Prompt Contrastive Decoding
arXiv:2602.20696v1 Announce Type: new Abstract: Reliable AI systems require large language models (LLMs) to exhibit behaviors aligned with human pref...
-
๐บ๐ธ From Logs to Language: Learning Optimal Verbalization for LLM-Based Recommendation in Production
arXiv:2602.20558v1 Announce Type: new Abstract: Large language models (LLMs) are promising backbones for generative recommender systems, yet a key ch...
-
๐บ๐ธ DMCD: Semantic-Statistical Framework for Causal Discovery
arXiv:2602.20333v1 Announce Type: new Abstract: We present DMCD (DataMap Causal Discovery), a two-phase causal discovery framework that integrates LL...
-
๐บ๐ธ Diffusion Generative Recommendation with Continuous Tokens
arXiv:2504.12007v5 Announce Type: replace-cross Abstract: Recent advances in generative artificial intelligence, particularly large language models (...
-
๐บ๐ธ DS-STAR: Data Science Agent for Solving Diverse Tasks across Heterogeneous Formats and Open-Ended Queries
arXiv:2509.21825v4 Announce Type: replace Abstract: While large language models (LLMs) have shown promise in automating data science, existing agents...
-
๐บ๐ธ Programming by Backprop: An Instruction is Worth 100 Examples When Finetuning LLMs
arXiv:2506.18777v2 Announce Type: replace Abstract: Large language models (LLMs) are typically trained to acquire behaviours from demonstrations or e...
-
๐บ๐ธ Sensory-Motor Control with Large Language Models via Iterative Policy Refinement
arXiv:2506.04867v4 Announce Type: replace Abstract: We propose a method that enables large language models (LLMs) to control embodied agents through ...
-
๐บ๐ธ A Survey on the Optimization of Large Language Model-based Agents
arXiv:2503.12434v2 Announce Type: replace Abstract: With the rapid development of Large Language Models (LLMs), LLM-based agents have been widely ado...
-
๐บ๐ธ The Art of Efficient Reasoning: Data, Reward, and Optimization
arXiv:2602.20945v1 Announce Type: cross Abstract: Large Language Models (LLMs) consistently benefit from scaled Chain-of-Thought (CoT) reasoning, but...
-
๐บ๐ธ Hybrid LLM-Embedded Dialogue Agents for Learner Reflection: Designing Responsive and Theory-Driven Interactions
arXiv:2602.20486v1 Announce Type: cross Abstract: Dialogue systems have long supported learner reflections, with theoretically grounded, rule-based d...
-
๐บ๐ธ No One Size Fits All: QueryBandits for Hallucination Mitigation
arXiv:2602.20332v1 Announce Type: cross Abstract: Advanced reasoning capabilities in Large Language Models (LLMs) have led to more frequent hallucina...
-
๐บ๐ธ InterviewSim: A Scalable Framework for Interview-Grounded Personality Simulation
arXiv:2602.20294v1 Announce Type: cross Abstract: Simulating real personalities with large language models requires grounding generation in authentic...
-
๐บ๐ธ Golden Layers and Where to Find Them: Improved Knowledge Editing for Large Language Models Via Layer Gradient Analysis
arXiv:2602.20207v1 Announce Type: cross Abstract: Knowledge editing in Large Language Models (LLMs) aims to update the model's prediction for a speci...
-
๐บ๐ธ A Benchmark for Deep Information Synthesis
arXiv:2602.21143v1 Announce Type: new Abstract: Large language model (LLM)-based agents are increasingly used to solve complex tasks involving tool u...
-
๐บ๐ธ LogicGraph : Benchmarking Multi-Path Logical Reasoning via Neuro-Symbolic Generation and Verification
arXiv:2602.21044v1 Announce Type: new Abstract: Evaluations of large language models (LLMs) primarily emphasize convergent logical reasoning, where s...
-
๐บ๐ธ HELP: HyperNode Expansion and Logical Path-Guided Evidence Localization for Accurate and Efficient GraphRAG
arXiv:2602.20926v1 Announce Type: new Abstract: Large Language Models (LLMs) often struggle with inherent knowledge boundaries and hallucinations, li...
-
๐บ๐ธ Qwen-BIM: developing large language model for BIM-based design with domain-specific benchmark and dataset
arXiv:2602.20812v1 Announce Type: new Abstract: As the construction industry advances toward digital transformation, BIM (Building Information Modeli...
-
๐บ๐ธ Counterfactual Simulation Training for Chain-of-Thought Faithfulness
arXiv:2602.20710v1 Announce Type: new Abstract: Inspecting Chain-of-Thought reasoning is among the most common means of understanding why an LLM prod...
-
๐บ๐ธ Grounding LLMs in Scientific Discovery via Embodied Actions
arXiv:2602.20639v1 Announce Type: new Abstract: Large Language Models (LLMs) have shown significant potential in scientific discovery but struggle to...
-
๐บ๐ธ An artificial intelligence framework for end-to-end rare disease phenotyping from clinical notes using large language models
arXiv:2602.20324v1 Announce Type: new Abstract: Phenotyping is fundamental to rare disease diagnosis, but manual curation of structured phenotypes fr...
-
๐บ๐ธ SparkMe: Adaptive Semi-Structured Interviewing for Qualitative Insight Discovery
arXiv:2602.21136v1 Announce Type: cross Abstract: Qualitative insights from user experiences are critical for informing product and policy decisions,...
-
๐บ๐ธ El Agente Gr\'afico: Structured Execution Graphs for Scientific Agents
arXiv:2602.17902v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used to automate scientific workflows, yet their integr...
-
๐บ๐ธ The Token Games: Evaluating Language Model Reasoning with Puzzle Duels
arXiv:2602.17831v1 Announce Type: new Abstract: Evaluating the reasoning capabilities of Large Language Models is increasingly challenging as models ...
-
๐บ๐ธ Retrieval Augmented (Knowledge Graph), and Large Language Model-Driven Design Structure Matrix (DSM) Generation of Cyber-Physical Systems
arXiv:2602.16715v1 Announce Type: new Abstract: We explore the potential of Large Language Models (LLMs), Retrieval-Augmented Generation (RAG), and G...
-
๐บ๐ธ OpenAI’s latest product lets you vibe code science
OpenAI just revealed what its new in-house team, OpenAI for Science, has been up to. The firm has released a free LLM-powered tool for scientists call...
-
๐บ๐ธ Advancing science and math with GPT-5.2
GPT-5.2 is OpenAIโs strongest model yet for math and science, setting new state-of-the-art results on benchmarks like GPQA Diamond and FrontierMath. T...
-
๐บ๐ธ GISA: A Benchmark for General Information-Seeking Assistant
arXiv:2602.08543v2 Announce Type: replace-cross Abstract: The advancement of large language models (LLMs) has significantly accelerated the developme...
-
๐บ๐ธ Provable Training Data Identification for Large Language Models
arXiv:2510.09717v2 Announce Type: replace-cross Abstract: Identifying training data of large-scale models is critical for copyright litigation, priva...
-
๐บ๐ธ Principled Synthetic Data Enables the First Scaling Laws for LLMs in Recommendation
arXiv:2602.07298v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) represent a promising frontier for recommender systems, yet th...
-
๐บ๐ธ Exploring AI-Augmented Sensemaking of Patient-Generated Health Data: A Mixed-Method Study with Healthcare Professionals in Cardiac Risk Reduction
arXiv:2602.05687v4 Announce Type: replace-cross Abstract: Individuals are increasingly generating substantial personal health and lifestyle data, e.g...
-
๐บ๐ธ Beyond Static Question Banks: Dynamic Knowledge Expansion via LLM-Automated Graph Construction and Adaptive Generation
arXiv:2602.00020v2 Announce Type: replace-cross Abstract: Personalized education systems increasingly rely on structured knowledge representations to...
-
๐บ๐ธ ATLAS : Adaptive Self-Evolutionary Research Agent with Task-Distributed Multi-LLM Supporters
arXiv:2602.02709v2 Announce Type: replace Abstract: Recent multi-LLM agent systems perform well in prompt optimization and automated problem-solving,...
-
๐บ๐ธ Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges
arXiv:2510.23883v2 Announce Type: replace Abstract: Agentic AI systems powered by large language models (LLMs) and endowed with planning, tool use, m...
-
๐บ๐ธ Know More, Know Clearer: A Meta-Cognitive Framework for Knowledge Augmentation in Large Language Models
arXiv:2602.12996v1 Announce Type: cross Abstract: Knowledge augmentation has significantly enhanced the performance of Large Language Models (LLMs) i...
-
๐บ๐ธ Knowledge-Based Design Requirements for Generative Social Robots in Higher Education
arXiv:2602.12873v1 Announce Type: cross Abstract: Generative social robots (GSRs) powered by large language models enable adaptive, conversational tu...
-
๐บ๐ธ CacheMind: From Miss Rates to Why -- Natural-Language, Trace-Grounded Reasoning for Cache Replacement
arXiv:2602.12422v1 Announce Type: cross Abstract: Cache replacement remains a challenging problem in CPU microarchitecture, often addressed using han...
-
๐บ๐ธ Reasoning about Intent for Ambiguous Requests
arXiv:2511.10453v2 Announce Type: replace-cross Abstract: Large language models often respond to ambiguous requests by implicitly committing to one i...
-
๐บ๐ธ Low-Resource Dialect Adaptation of Large Language Models: A French Dialect Case-Study
arXiv:2510.22747v2 Announce Type: replace-cross Abstract: Despite the widespread adoption of large language models (LLMs), their strongest capabiliti...
-
๐บ๐ธ Eliminating stability hallucinations in llm-based tts models via attention guidance
arXiv:2509.19852v2 Announce Type: replace-cross Abstract: This paper focuses on resolving stability hallucinations (e.g., repetitive or omitted speec...
-
๐บ๐ธ ToolACE-MT: Non-Autoregressive Generation for Agentic Multi-Turn Interaction
arXiv:2508.12685v3 Announce Type: replace-cross Abstract: Agentic task-solving with Large Language Models (LLMs) requires multi-turn, multi-step inte...
-
๐บ๐ธ R-Zero: Self-Evolving Reasoning LLM from Zero Data
arXiv:2508.05004v4 Announce Type: replace-cross Abstract: Self-evolving Large Language Models (LLMs) offer a scalable path toward super-intelligence ...
-
๐บ๐ธ PlanetServe: A Decentralized, Scalable, and Privacy-Preserving Overlay for Democratizing Large Language Model Serving
arXiv:2504.20101v5 Announce Type: replace-cross Abstract: While significant progress has been made in research and development on open-source and cos...
-
๐บ๐ธ LTSM-Bundle: A Toolbox and Benchmark on Large Language Models for Time Series Forecasting
arXiv:2406.14045v3 Announce Type: replace-cross Abstract: Time Series Forecasting (TSF) has long been a challenge in time series analysis. Inspired b...
-
๐บ๐ธ WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning
arXiv:2602.04634v2 Announce Type: replace Abstract: Recent advancements in Large Language Models (LLMs) have largely focused on depth scaling, where ...
-
๐บ๐ธ TRACE: Temporal Reasoning via Agentic Context Evolution for Streaming Electronic Health Records (EHRs)
arXiv:2602.12833v1 Announce Type: cross Abstract: Large Language Models (LLMs) encode extensive medical knowledge but struggle to apply it reliably t...
-
๐บ๐ธ Understanding Chain-of-Thought in Large Language Models via Topological Data Analysis
arXiv:2512.19135v2 Announce Type: replace Abstract: With the development of large language models (LLMs), particularly with the introduction of the l...
-
๐บ๐ธ RLIE: Rule Generation with Logistic Regression, Iterative Refinement, and Evaluation for Large Language Models
arXiv:2510.19698v2 Announce Type: replace Abstract: Large Language Models (LLMs) can propose rules in natural language, sidestepping the need for a p...
-
๐บ๐ธ Difficulty-Aware Agentic Orchestration for Query-Specific Multi-Agent Workflows
arXiv:2509.11079v5 Announce Type: replace Abstract: Large Language Model (LLM)-based agentic systems have shown strong capabilities across various ta...
-
๐บ๐ธ SCAN: Semantic Document Layout Analysis for Textual and Visual Retrieval-Augmented Generation
arXiv:2505.14381v3 Announce Type: replace Abstract: With the increasing adoption of Large Language Models (LLMs) and Vision-Language Models (VLMs), r...
-
๐บ๐ธ Asynchronous Verified Semantic Caching for Tiered LLM Architectures
arXiv:2602.13165v1 Announce Type: cross Abstract: Large language models (LLMs) now sit in the critical path of search, assistance, and agentic workfl...
-
๐บ๐ธ Buy versus Build an LLM: A Decision Framework for Governments
arXiv:2602.13033v1 Announce Type: cross Abstract: Large Language Models (LLMs) represent a new frontier of digital infrastructure that can support a ...
-
๐บ๐ธ Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward
arXiv:2602.12430v1 Announce Type: cross Abstract: The transition from monolithic language models to modular, skill-equipped agents marks a defining s...
-
๐บ๐ธ From Biased Chatbots to Biased Agents: Examining Role Assignment Effects on LLM Agent Robustness
arXiv:2602.12285v1 Announce Type: cross Abstract: Large Language Models (LLMs) are increasingly deployed as autonomous agents capable of actions with...
-
๐บ๐ธ TriGen: NPU Architecture for End-to-End Acceleration of Large Language Models based on SW-HW Co-Design
arXiv:2602.12962v1 Announce Type: cross Abstract: Recent studies have extensively explored NPU architectures for accelerating AI inference in on-devi...
-
๐บ๐ธ Left-right asymmetry in predicting brain activity from LLMs' representations emerges with their formal linguistic competence
arXiv:2602.12811v1 Announce Type: cross Abstract: When humans and large language models (LLMs) process the same text, activations in the LLMs correla...
-
๐บ๐ธ "Not Human, Funnier": How Machine Identity Shapes Humor Perception in Online AI Stand-up Comedy
arXiv:2602.12763v1 Announce Type: cross Abstract: Chatbots are increasingly applied to domains previously reserved for human actors. One such domain ...
-
๐บ๐ธ VI-CuRL: Stabilizing Verifier-Independent RL Reasoning via Confidence-Guided Variance Reduction
arXiv:2602.12579v1 Announce Type: cross Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a dominant paradigm for enhanc...
-
๐บ๐ธ SD-MoE: Spectral Decomposition for Effective Expert Specialization
arXiv:2602.12556v1 Announce Type: cross Abstract: Mixture-of-Experts (MoE) architectures scale Large Language Models via expert specialization induce...
-
๐บ๐ธ RankLLM: Weighted Ranking of LLMs by Quantifying Question Difficulty
arXiv:2602.12424v1 Announce Type: cross Abstract: Benchmarks establish a standardized evaluation framework to systematically assess the performance o...
-
๐บ๐ธ Soft Contamination Means Benchmarks Test Shallow Generalization
arXiv:2602.12413v1 Announce Type: cross Abstract: If LLM training data is polluted with benchmark test data, then benchmark performance gives biased ...
-
๐บ๐ธ OptiML: An End-to-End Framework for Program Synthesis and CUDA Kernel Optimization
arXiv:2602.12305v1 Announce Type: cross Abstract: Generating high-performance CUDA kernels remains challenging due to the need to navigate a combinat...
-
๐บ๐ธ To Mix or To Merge: Toward Multi-Domain Reinforcement Learning for Large Language Models
arXiv:2602.12566v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) plays a key role in stimulating the explicit re...
-
๐บ๐ธ Intent-Driven Smart Manufacturing Integrating Knowledge Graphs and Large Language Models
arXiv:2602.12419v1 Announce Type: new Abstract: The increasing complexity of smart manufacturing environments demands interfaces that can translate h...
๐ Entity Intersection Graph
People and organizations frequently mentioned alongside Large language model:
-
Artificial intelligence ยท 20 shared articles -
Reinforcement learning ยท 11 shared articles -
๐
Ethics of artificial intelligence ยท 11 shared articles
-
๐
AI safety ยท 11 shared articles
-
๐
AI alignment ยท 7 shared articles
-
๐
Machine learning ยท 6 shared articles
-
๐
AI agent ยท 5 shared articles
-
Educational technology ยท 4 shared articles -
๐
Theory of mind ยท 4 shared articles
-
๐
Benchmark ยท 3 shared articles