Large language model

Type of machine learning model

📊 Rating

369 news mentions · 👍 0 likes · 👎 0 dislikes

📌 Topics

Artificial Intelligence (95)
Machine Learning (42)
AI Safety (20)
AI Evaluation (20)
AI Research (16)
Natural Language Processing (13)
AI Ethics (10)
Software Engineering (8)
AI Security (7)
AI Bias (7)
AI Benchmarking (7)
Cybersecurity (6)

🏷️ Keywords

Large Language Models (128) · LLM (67) · large language models (45) · LLMs (40) · arXiv (39) · Large language models (14) · machine learning (13) · benchmark (13) · AI safety (11) · AI (11) · AI Research (11) · AI evaluation (10) · Reinforcement Learning (9) · AI Safety (8) · evaluation (8) · reasoning (8) · Retrieval-Augmented Generation (8) · Machine Learning (7) · AI Evaluation (6) · AI reliability (6)

📖 Key Information

A large language model (LLM) is a language model trained with self-supervised machine learning on a vast amount of text, designed for natural language processing tasks, especially language generation. The largest and most capable LLMs are generative pre-trained transformers (GPTs) that provide the core capabilities of modern chatbots. LLMs can be fine-tuned for specific tasks or guided by prompt engineering.

📰 Related News (369)

🇺🇸 Heuristic Classification of Thoughts Prompting (HCoT): Integrating Expert System Heuristics for Structured Reasoning into Large Language Models (2026-04-15)
arXiv:2604.12390v1 Announce Type: new Abstract: This paper addresses two limitations of large language models (LLMs) in solving complex problems: (1)...
🇺🇸 Preventing Safety Drift in Large Language Models via Coupled Weight and Activation Constraints (2026-04-15)
arXiv:2604.12384v1 Announce Type: new Abstract: Safety alignment in Large Language Models (LLMs) remains highly fragile during fine-tuning, where eve...
🇺🇸 A Scoping Review of Large Language Model-Based Pedagogical Agents (2026-04-15)
arXiv:2604.12253v1 Announce Type: new Abstract: This scoping review examines the emerging field of Large Language Model (LLM)-based pedagogical agent...
🇺🇸 Blending Human and LLM Expertise to Detect Hallucinations and Omissions in Mental Health Chatbot Responses (2026-04-09)
arXiv:2604.06216v1 Announce Type: cross Abstract: As LLM-powered chatbots are increasingly deployed in mental health services, detecting hallucinatio...
🇺🇸 Automating Database-Native Function Code Synthesis with LLMs (2026-04-09)
arXiv:2604.06231v1 Announce Type: cross Abstract: Database systems incorporate an ever-growing number of functions in their kernels (a.k.a., database...
🇺🇸 Incentive-Aware Multi-Fidelity Optimization for Generative Advertising in Large Language Models (2026-04-09)
arXiv:2604.06263v1 Announce Type: cross Abstract: Generative advertising in large language model (LLM) responses requires optimizing sponsorship conf...
🇺🇸 Attribution-Driven Explainable Intrusion Detection with Encoder-Based Large Language Models (2026-04-09)
arXiv:2604.06266v1 Announce Type: cross Abstract: Software-Defined Networking (SDN) improves network flexibility but also increases the need for reli...
🇺🇸 Towards the Development of an LLM-Based Methodology for Automated Security Profiling in Compliance with Ukrainian Cybersecurity Regulations (2026-04-09)
arXiv:2604.06274v1 Announce Type: cross Abstract: In recent years, the pace of development of information technology in various areas has increased d...
🇺🇸 TalkLoRA: Communication-Aware Mixture of Low-Rank Adaptation for Large Language Models (2026-04-09)
arXiv:2604.06291v1 Announce Type: cross Abstract: Low-Rank Adaptation (LoRA) enables parameter-efficient fine-tuning of Large Language Models (LLMs),...
🇺🇸 FMI@SU ToxHabits: Evaluating LLMs Performance on Toxic Habit Extraction in Spanish Clinical Texts (2026-04-09)
arXiv:2604.06403v1 Announce Type: cross Abstract: The paper presents an approach for the recognition of toxic habits named entities in Spanish clinic...
🇺🇸 The Depth Ceiling: On the Limits of Large Language Models in Discovering Latent Planning (2026-04-09)
arXiv:2604.06427v1 Announce Type: cross Abstract: The viability of chain-of-thought (CoT) monitoring hinges on models being unable to reason effectiv...
🇺🇸 Inference-Time Code Selection via Symbolic Equivalence Partitioning (2026-04-09)
arXiv:2604.06485v1 Announce Type: cross Abstract: "Best-of-N" selection is a popular inference-time scaling method for code generation using Large La...
🇺🇸 Distributed Interpretability and Control for Large Language Models (2026-04-09)
arXiv:2604.06483v1 Announce Type: cross Abstract: Large language models that require multiple GPU cards to host are usually the most capable models. ...
🇺🇸 Improving Robustness In Sparse Autoencoders via Masked Regularization (2026-04-09)
arXiv:2604.06495v1 Announce Type: cross Abstract: Sparse autoencoders (SAEs) are widely used in mechanistic interpretability to project LLM activatio...
🇺🇸 Scientific Knowledge-driven Decoding Constraints Improving the Reliability of LLMs (2026-04-09)
arXiv:2604.06603v1 Announce Type: cross Abstract: Large language models (LLMs) have shown strong knowledge reserves and task-solving capabilities, bu...
🇺🇸 LLM-based Schema-Guided Extraction and Validation of Missing-Person Intelligence from Heterogeneous Data Sources (2026-04-09)
arXiv:2604.06571v1 Announce Type: cross Abstract: Missing-person and child-safety investigations rely on heterogeneous case documents, including stru...
🇺🇸 SHAPE: Stage-aware Hierarchical Advantage via Potential Estimation for LLM Reasoning (2026-04-09)
arXiv:2604.06636v1 Announce Type: cross Abstract: Process supervision has emerged as a promising approach for enhancing LLM reasoning, yet existing m...
🇺🇸 Fine-grained Approaches for Confidence Calibration of LLMs in Automated Code Revision (2026-04-09)
arXiv:2604.06723v1 Announce Type: cross Abstract: In today's AI-assisted software engineering landscape, developers increasingly depend on LLMs that ...
🇺🇸 Restoring Heterogeneity in LLM-based Social Simulation: An Audience Segmentation Approach (2026-04-09)
arXiv:2604.06663v1 Announce Type: cross Abstract: Large Language Models (LLMs) are increasingly used to simulate social attitudes and behaviors, offe...
🇺🇸 Evaluating LLM-Based 0-to-1 Software Generation in End-to-End CLI Tool Scenarios (2026-04-09)
arXiv:2604.06742v1 Announce Type: cross Abstract: Large Language Models (LLMs) are driving a shift towards intent-driven development, where agents bu...
🇺🇸 TeamLLM: A Human-Like Team-Oriented Collaboration Framework for Multi-Step Contextualized Tasks (2026-04-09)
arXiv:2604.06765v1 Announce Type: cross Abstract: Recently, multi-Large Language Model (LLM) frameworks have been proposed to solve contextualized ta...
🇺🇸 On the Step Length Confounding in LLM Reasoning Data Selection (2026-04-09)
arXiv:2604.06834v1 Announce Type: cross Abstract: Large reasoning models have recently demonstrated strong performance on complex tasks that require ...
🇺🇸 Digital Skin, Digital Bias: Uncovering Tone-Based Biases in LLMs and Emoji Embeddings (2026-04-09)
arXiv:2604.06863v1 Announce Type: cross Abstract: Skin-toned emojis are crucial for fostering personal identity and social inclusion in online commun...
🇺🇸 MedDialBench: Benchmarking LLM Diagnostic Robustness under Parametric Adversarial Patient Behaviors (2026-04-09)
arXiv:2604.06846v1 Announce Type: cross Abstract: Interactive medical dialogue benchmarks have shown that LLM diagnostic accuracy degrades significan...
🇺🇸 SentinelSphere: Integrating AI-Powered Real-Time Threat Detection with Cybersecurity Awareness Training (2026-04-09)
arXiv:2604.06900v1 Announce Type: cross Abstract: The field of cybersecurity is confronted with two interrelated challenges: a worldwide deficit of q...
🇺🇸 The AI Skills Shift: Mapping Skill Obsolescence, Emergence, and Transition Pathways in the LLM Era (2026-04-09)
arXiv:2604.06906v1 Announce Type: cross Abstract: As Large Language Models reshape the global labor market, policymakers and workers need empirical d...
🇺🇸 Self-Preference Bias in Rubric-Based Evaluation of Large Language Models (2026-04-09)
arXiv:2604.06996v1 Announce Type: cross Abstract: LLM-as-a-judge has become the de facto approach for evaluating LLM outputs. However, judges are kno...
🇺🇸 Attribution Bias in Large Language Models (2026-04-08)
arXiv:2604.05224v1 Announce Type: new Abstract: As Large Language Models (LLMs) are increasingly used to support search and information retrieval, it...
🇺🇸 Adversarial Moral Stress Testing of Large Language Models (2026-04-02)
arXiv:2604.01108v1 Announce Type: new Abstract: Evaluating the ethical robustness of large language models (LLMs) deployed in software systems remain...
🇺🇸 Eyla: Toward an Identity-Anchored LLM Architecture with Integrated Biological Priors -- Vision, Implementation Attempt, and Lessons from AI-Assisted Development (2026-04-02)
arXiv:2604.00009v1 Announce Type: cross Abstract: We present the design rationale, implementation attempt, and failure analysis of Eyla, a proposed i...
🇺🇸 Quantifying Gender Bias in Large Language Models: When ChatGPT Becomes a Hiring Manager (2026-04-02)
arXiv:2604.00011v1 Announce Type: cross Abstract: The growing prominence of large language models (LLMs) in daily life has heightened concerns that L...
🇺🇸 Think Twice Before You Write -- an Entropy-based Decoding Strategy to Enhance LLM Reasoning (2026-04-02)
arXiv:2604.00018v1 Announce Type: cross Abstract: Decoding strategies play a central role in shaping the reasoning ability of large language models (...
🇺🇸 Can LLMs Perceive Time? An Empirical Investigation (2026-04-02)
arXiv:2604.00010v1 Announce Type: cross Abstract: Large language models cannot estimate how long their own tasks take. We investigate this limitation...
🇺🇸 Dual Optimal: Make Your LLM Peer-like with Dignity (2026-04-02)
arXiv:2604.00979v1 Announce Type: cross Abstract: Current aligned language models exhibit a dual failure mode we term the Evasive Servant: they sycop...
🇺🇸 Automated Framework to Evaluate and Harden LLM System Instructions against Encoding Attacks (2026-04-02)
arXiv:2604.01039v1 Announce Type: cross Abstract: System Instructions in Large Language Models (LLMs) are commonly used to enforce safety policies, d...
🇺🇸 Fast and Accurate Probing of In-Training LLMs' Downstream Performances (2026-04-02)
arXiv:2604.01025v1 Announce Type: cross Abstract: The paradigm of scaling Large Language Models (LLMs) in both parameter size and test time has pushe...
🇺🇸 The Energy Footprint of LLM-Based Environmental Analysis: LLMs and Domain Products (2026-04-02)
arXiv:2604.00053v1 Announce Type: cross Abstract: As large language models (LLMs) are increasingly used in domain-specific applications, including cl...
🇺🇸 Hierarchical Pre-Training of Vision Encoders with Large Language Models (2026-04-02)
arXiv:2604.00086v1 Announce Type: cross Abstract: The field of computer vision has experienced significant advancements through scalable vision encod...
🇺🇸 Genesis: Evolving Attack Strategies for LLM Web Agent Red-Teaming (2026-04-02)
arXiv:2510.18314v2 Announce Type: replace Abstract: As large language model (LLM) agents increasingly automate complex web tasks, they boost producti...
🇺🇸 Auto-Formulating Dynamic Programming Problems with Large Language Models (2026-04-02)
arXiv:2507.11737v2 Announce Type: replace Abstract: Dynamic programming (DP) is a fundamental method in operations research, but formulating DP model...
🇺🇸 Bethe Ansatz with a Large Language Model (2026-04-01)
arXiv:2603.29932v1 Announce Type: cross Abstract: We explore the capability of a Large Language Model (LLM) to perform specific computations in mathe...
🇺🇸 GISTBench: Evaluating LLM User Understanding via Evidence-Based Interest Verification (2026-04-01)
arXiv:2603.29112v1 Announce Type: new Abstract: We introduce GISTBench, a benchmark for evaluating Large Language Models' (LLMs) ability to understan...
🇺🇸 Theory of Mind and Self-Attributions of Mentality are Dissociable in LLMs (2026-04-01)
arXiv:2603.28925v1 Announce Type: cross Abstract: Safety fine-tuning in Large Language Models (LLMs) seeks to suppress potentially harmful forms of m...
🇺🇸 Understand and Accelerate Memory Processing Pipeline for Disaggregated LLM Inference (2026-04-01)
arXiv:2603.29002v1 Announce Type: cross Abstract: Modern large language models (LLMs) increasingly depends on efficient long-context processing and g...
🇺🇸 KEditVis: A Visual Analytics System for Knowledge Editing of Large Language Models (2026-04-01)
arXiv:2603.29689v1 Announce Type: cross Abstract: Large Language Models (LLMs) demonstrate exceptional capabilities in factual question answering, ye...
🇺🇸 Agenda-based Narrative Extraction: Steering Pathfinding Algorithms with Large Language Models (2026-04-01)
arXiv:2603.29661v1 Announce Type: cross Abstract: Existing narrative extraction methods face a trade-off between coherence, interactivity, and multi-...
🇺🇸 Enhancing Structural Mapping with LLM-derived Abstractions for Analogical Reasoning in Narratives (2026-04-01)
arXiv:2603.29997v1 Announce Type: cross Abstract: Analogical reasoning is a key driver of human generalization in problem-solving and argumentation. ...
🇺🇸 Hybrid Framework for Robotic Manipulation: Integrating Reinforcement Learning and Large Language Models (2026-04-01)
arXiv:2603.30022v1 Announce Type: cross Abstract: This paper introduces a new hybrid framework that combines Reinforcement Learning (RL) and Large La...
🇺🇸 AI benchmarks are broken. Here’s what we need instead. (2026-03-31)
For decades, artificial intelligence has been evaluated through the question of whether machines outperform humans. From chess to advanced math, from ...
🇺🇸 Training a Large Language Model for Medical Coding Using Privacy-Preserving Synthetic Clinical Data (2026-03-26)
arXiv:2603.23515v1 Announce Type: cross Abstract: Improving the accuracy and reliability of medical coding reduces clinician burnout and supports rev...
🇺🇸 MedMT-Bench: Can LLMs Memorize and Understand Long Multi-Turn Conversations in Medical Scenarios? (2026-03-26)
arXiv:2603.23519v1 Announce Type: cross Abstract: Large Language Models (LLMs) have demonstrated impressive capabilities across various specialist do...
🇺🇸 Generating Hierarchical JSON Representations of Scientific Sentences Using LLMs (2026-03-26)
arXiv:2603.23532v1 Announce Type: cross Abstract: This paper investigates whether structured representations can preserve the meaning of scientific s...
🇺🇸 LLMORPH: Automated Metamorphic Testing of Large Language Models (2026-03-26)
arXiv:2603.23611v1 Announce Type: cross Abstract: Automated testing is essential for evaluating and improving the reliability of Large Language Model...
🇺🇸 Probing Ethical Framework Representations in Large Language Models: Structure, Entanglement, and Methodological Challenges (2026-03-26)
arXiv:2603.23659v1 Announce Type: cross Abstract: When large language models make ethical judgments, do their internal representations distinguish be...
🇺🇸 PoliticsBench: Benchmarking Political Values in Large Language Models with Multi-Turn Roleplay (2026-03-26)
arXiv:2603.23841v1 Announce Type: cross Abstract: While Large Language Models (LLMs) are increasingly used as primary sources of information, their p...
🇺🇸 Policy-Guided Threat Hunting: An LLM enabled Framework with Splunk SOC Triage (2026-03-26)
arXiv:2603.23966v1 Announce Type: cross Abstract: With frequently evolving Advanced Persistent Threats (APTs) in cyberspace, traditional security sol...
🇺🇸 When AI Meets Early Childhood Education: Large Language Models as Assessment Teammates in Chinese Preschools (2026-03-26)
arXiv:2603.24389v1 Announce Type: cross Abstract: High-quality teacher-child interaction (TCI) is fundamental to early childhood development, yet tra...
🇺🇸 Are LLMs Smarter Than Chimpanzees? An Evaluation on Perspective Taking and Knowledge State Estimation (2026-03-26)
arXiv:2601.12410v2 Announce Type: replace Abstract: Cognitive anthropology suggests that the distinction of human intelligence lies in the ability to...
🇺🇸 Evaluation of Large Language Models via Coupled Token Generation (2026-03-26)
arXiv:2502.01754v3 Announce Type: replace-cross Abstract: State of the art large language models rely on randomization to respond to a prompt. As an ...
🇺🇸 A Theory of LLM Information Susceptibility (2026-03-26)
arXiv:2603.23626v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly deployed as optimization modules in agentic systems, ...
🇺🇸 LLMLOOP: Improving LLM-Generated Code and Tests through Automated Iterative Feedback Loops (2026-03-26)
arXiv:2603.23613v1 Announce Type: cross Abstract: Large Language Models (LLMs) are showing remarkable performance in generating source code, yet the ...
🇺🇸 Object Search in Partially-Known Environments via LLM-informed Model-based Planning and Prompt Selection (2026-03-26)
arXiv:2603.23800v1 Announce Type: cross Abstract: We present a novel LLM-informed model-based planning framework, and a novel prompt selection method...
🇺🇸 From Guidelines to Guarantees: A Graph-Based Evaluation Harness for Domain-Specific Evaluation of LLMs (2026-03-26)
arXiv:2508.20810v2 Announce Type: replace Abstract: Rigorous evaluation of domain-specific language models requires benchmarks that are comprehensive...
🇺🇸 A Comprehensive Survey on Enterprise Financial Risk Analysis from Big Data and LLMs Perspective (2026-03-26)
arXiv:2211.14997v5 Announce Type: replace-cross Abstract: Enterprise financial risk analysis aims at predicting the future financial risk of enterpri...
🇺🇸 Ran Score: a LLM-based Evaluation Score for Radiology Report Generation (2026-03-25)
arXiv:2603.22935v1 Announce Type: new Abstract: Chest X-ray report generation and automated evaluation are limited by poor recognition of low-prevale...
🇺🇸 Between Rules and Reality: On the Context Sensitivity of LLM Moral Judgment (2026-03-25)
arXiv:2603.23114v1 Announce Type: new Abstract: A human's moral decision depends heavily on the context. Yet research on LLM morality has largely stu...
🇺🇸 LLM Olympiad: Why Model Evaluation Needs a Sealed Exam (2026-03-25)
arXiv:2603.23292v1 Announce Type: new Abstract: Benchmarks and leaderboards are how NLP most often communicates progress, but in the LLM era they are...
🇺🇸 Automated Microservice Pattern Instance Detection Using Infrastructure-as-Code Artifacts and Large Language Models (2026-03-25)
arXiv:2502.04188v1 Announce Type: cross Abstract: Documenting software architecture is essential to preserve architecture knowledge, even though it i...
🇺🇸 TIPS: Turn-Level Information-Potential Reward Shaping for Search-Augmented LLMs (2026-03-25)
arXiv:2603.22293v1 Announce Type: cross Abstract: Search-augmented large language models (LLMs) trained with reinforcement learning (RL) have achieve...
🇺🇸 Latent Semantic Manifolds in Large Language Models (2026-03-25)
arXiv:2603.22301v1 Announce Type: cross Abstract: Large Language Models (LLMs) perform internal computations in continuous vector spaces yet produce ...
🇺🇸 Sample Transform Cost-Based Training-Free Hallucination Detector for Large Language Models (2026-03-25)
arXiv:2603.22303v1 Announce Type: cross Abstract: Hallucinations in large language models (LLMs) remain a central obstacle to trustworthy deployment,...
🇺🇸 DAQ: Delta-Aware Quantization for Post-Training LLM Weight Compression (2026-03-25)
arXiv:2603.22324v1 Announce Type: cross Abstract: We introduce Delta-Aware Quantization (DAQ), a data-free post-training quantization framework that ...
🇺🇸 Sparse but Critical: A Token-Level Analysis of Distributional Shifts in RLVR Fine-Tuning of LLMs (2026-03-25)
arXiv:2603.22446v1 Announce Type: cross Abstract: Reinforcement learning with verifiable rewards (RLVR) has significantly improved reasoning in large...
🇺🇸 DALDALL: Data Augmentation for Lexical and Semantic Diverse in Legal Domain by leveraging LLM-Persona (2026-03-25)
arXiv:2603.22765v1 Announce Type: cross Abstract: Data scarcity remains a persistent challenge in low-resource domains. While existing data augmentat...
🇺🇸 Set-Valued Prediction for Large Language Models with Feasibility-Aware Coverage Guarantees (2026-03-25)
arXiv:2603.22966v1 Announce Type: cross Abstract: Large language models (LLMs) inherently operate over a large generation space, yet conventional usa...
🇺🇸 Can an LLM Detect Instances of Microservice Infrastructure Patterns? (2026-03-25)
arXiv:2603.23073v1 Announce Type: cross Abstract: Architectural patterns are frequently found in various software artifacts. The wide variety of patt...
🇺🇸 LLM-guided headline rewriting for clickability enhancement without clickbait (2026-03-25)
arXiv:2603.22459v1 Announce Type: cross Abstract: Enhancing reader engagement while preserving informational fidelity is a central challenge in contr...
🇺🇸 Emergence of Fragility in LLM-based Social Networks: the Case of Moltbook (2026-03-25)
arXiv:2603.23279v1 Announce Type: cross Abstract: The rapid diffusion of large language models and the growth in their capability has enabled the eme...
🇺🇸 Leveraging LLMs and Social Media to Understand User Perception of Smartphone-Based Earthquake Early Warnings (2026-03-25)
arXiv:2603.23322v1 Announce Type: cross Abstract: Android's Earthquake Alert (AEA) system provided timely early warnings to millions during the Mw 6....
🇺🇸 ORACLE: Optimizing Reasoning Abilities of Large Language Models via Constraint-Led Synthetic Data Elicitation (2026-03-24)
arXiv:2603.21140v1 Announce Type: new Abstract: Training large language models (LLMs) with synthetic reasoning data has become a popular approach to ...
🇺🇸 A Framework for Low-Latency, LLM-driven Multimodal Interaction on the Pepper Robot (2026-03-24)
arXiv:2603.21013v1 Announce Type: new Abstract: Despite recent advances in integrating Large Language Models (LLMs) into social robotics, two weaknes...
🇺🇸 Knowledge Boundary Discovery for Large Language Models (2026-03-24)
arXiv:2603.21022v1 Announce Type: new Abstract: We propose Knowledge Boundary Discovery (KBD), a reinforcement learning based framework to explore th...
🇺🇸 Characterizing the ability of LLMs to recapitulate Americans' distributional responses to public opinion polling questions across political issues (2026-03-24)
arXiv:2603.20229v1 Announce Type: cross Abstract: Traditional survey-based political issue polling is becoming less tractable due to increasing costs...
🇺🇸 ReBOL: Retrieval via Bayesian Optimization with Batched LLM Relevance Observations and Query Reformulation (2026-03-24)
arXiv:2603.20513v1 Announce Type: cross Abstract: LLM-reranking is limited by the top-k documents retrieved by vector similarity, which neither enabl...
🇺🇸 Reasoning or Rhetoric? An Empirical Analysis of Moral Reasoning Explanations in Large Language Models (2026-03-24)
arXiv:2603.21854v1 Announce Type: new Abstract: Do large language models reason morally, or do they merely sound like they do? We investigate whether...
🇺🇸 Enhancing Safety of Large Language Models via Embedding Space Separation (2026-03-24)
arXiv:2603.20206v1 Announce Type: cross Abstract: Large language models (LLMs) have achieved impressive capabilities, yet ensuring their safety again...
🇺🇸 Policies Permitting LLM Use for Polishing Peer Reviews Are Currently Not Enforceable (2026-03-24)
arXiv:2603.20450v1 Announce Type: cross Abstract: A number of scientific conferences and journals have recently enacted policies that prohibit LLM us...
🇺🇸 Evaluating Large Language Models on Historical Health Crisis Knowledge in Resource-Limited Settings: A Hybrid Multi-Metric Study (2026-03-24)
arXiv:2603.20514v1 Announce Type: cross Abstract: Large Language Models (LLMs) offer significant potential for delivering health information. However...
🇺🇸 PAVE: Premise-Aware Validation and Editing for Retrieval-Augmented LLMs (2026-03-24)
arXiv:2603.20673v1 Announce Type: cross Abstract: Retrieval-augmented language models can retrieve relevant evidence yet still commit to answers befo...
🇺🇸 LLM-Enhanced Semantic Data Integration of Electronic Component Qualifications in the Aerospace Domain (2026-03-23)
arXiv:2603.20094v1 Announce Type: cross Abstract: Large manufacturing companies face challenges in information retrieval due to data silos maintained...
🇺🇸 Semantic Token Clustering for Efficient Uncertainty Quantification in Large Language Models (2026-03-23)
arXiv:2603.20161v1 Announce Type: cross Abstract: Large language models (LLMs) have demonstrated remarkable capabilities across diverse tasks. Howeve...
🇺🇸 Measuring Faithfulness Depends on How You Measure: Classifier Sensitivity in LLM Chain-of-Thought Evaluation (2026-03-23)
arXiv:2603.20172v1 Announce Type: cross Abstract: Recent work on chain-of-thought (CoT) faithfulness reports single aggregate numbers (e.g., DeepSeek...
🇺🇸 Memory-Driven Role-Playing: Evaluation and Enhancement of Persona Knowledge Utilization in LLMs (2026-03-23)
arXiv:2603.19313v1 Announce Type: cross Abstract: A core challenge for faithful LLM role-playing is sustaining consistent characterization throughout...
🇺🇸 MemReward: Graph-Based Experience Memory for LLM Reward Prediction with Limited Labels (2026-03-23)
arXiv:2603.19310v1 Announce Type: cross Abstract: Training large language models (LLMs) for complex reasoning via reinforcement learning requires rew...
🇺🇸 Inducing Sustained Creativity and Diversity in Large Language Models (2026-03-23)
arXiv:2603.19519v1 Announce Type: cross Abstract: We address a not-widely-recognized subset of exploratory search, where a user sets out on a typical...
🇺🇸 LISAA: A Framework for Large Language Model Information Security Awareness Assessment (2026-03-23)
arXiv:2411.13207v3 Announce Type: replace-cross Abstract: The popularity of large language models (LLMs) continues to grow, and LLM-based assistants ...
🇺🇸 Can LLM generate interesting mathematical research problems? (2026-03-20)
arXiv:2603.18813v1 Announce Type: new Abstract: This paper is the second one in a series of work on the mathematical creativity of LLM. In the first ...
🇺🇸 Secure Linear Alignment of Large Language Models (2026-03-20)
arXiv:2603.18908v1 Announce Type: new Abstract: Language models increasingly appear to learn similar representations, despite differences in training...
🇺🇸 Do Large Language Models Possess a Theory of Mind? A Comparative Evaluation Using the Strange Stories Paradigm (2026-03-20)
arXiv:2603.18007v1 Announce Type: cross Abstract: The study explores whether current Large Language Models (LLMs) exhibit Theory of Mind (ToM) capabi...
🇺🇸 Can LLMs Reason Like Automated Theorem Provers for Rust Verification? VCoT-Bench: Evaluating via Verification Chain of Thought (2026-03-20)
arXiv:2603.18334v1 Announce Type: cross Abstract: As Large Language Models (LLMs) increasingly assist secure software development, their ability to m...
🇺🇸 PowerFlow: Unlocking the Dual Nature of LLMs via Principled Distribution Matching (2026-03-20)
arXiv:2603.18363v1 Announce Type: cross Abstract: Unsupervised Reinforcement Learning from Internal Feedback (RLIF) has emerged as a promising paradi...
🇺🇸 Measuring and Exploiting Confirmation Bias in LLM-Assisted Security Code Review (2026-03-20)
arXiv:2603.18740v1 Announce Type: cross Abstract: Security code reviews increasingly rely on systems integrating Large Language Models (LLMs), rangin...
🇺🇸 Are complicated loss functions necessary for teaching LLMs to reason? (2026-03-20)
arXiv:2603.18756v1 Announce Type: cross Abstract: Recent advances in large language models (LLMs) highlight the importance of post training technique...
🇺🇸 Lightweight Adaptation for LLM-based Technical Service Agent: Latent Logic Augmentation and Robust Noise Reduction (2026-03-20)
arXiv:2603.18074v1 Announce Type: cross Abstract: Adapting Large Language Models in complex technical service domains is constrained by the absence o...
🇺🇸 VC-Soup: Value-Consistency Guided Multi-Value Alignment for Large Language Models (2026-03-20)
arXiv:2603.18113v1 Announce Type: cross Abstract: As large language models (LLMs) increasingly shape content generation, interaction, and decision-ma...
🇺🇸 AutORAN: LLM-driven Natural Language Programming for Agile xApp Development (2026-03-20)
arXiv:2603.18604v1 Announce Type: cross Abstract: Traditional RAN systems are closed and monolithic, stifling innovation. The openness and programmab...
🇺🇸 Functional Subspace Watermarking for Large Language Models (2026-03-20)
arXiv:2603.18793v1 Announce Type: cross Abstract: Model watermarking utilizes internal representations to protect the ownership of large language mod...
🇺🇸 Evaluating LLM-Generated Lessons from the Language Learning Students' Perspective: A Short Case Study on Duolingo (2026-03-20)
arXiv:2603.18873v1 Announce Type: cross Abstract: Popular language learning applications such as Duolingo use large language models (LLMs) to generat...
🇺🇸 Security Assessment and Mitigation Strategies for Large Language Models: A Comprehensive Defensive Framework (2026-03-19)
arXiv:2603.17123v1 Announce Type: cross Abstract: Large Language Models increasingly power critical infrastructure from healthcare to finance, yet th...
🇺🇸 Detecting Data Poisoning in Code Generation LLMs via Black-Box, Vulnerability-Oriented Scanning (2026-03-19)
arXiv:2603.17174v1 Announce Type: cross Abstract: Code generation large language models (LLMs) are increasingly integrated into modern software devel...
🇺🇸 Can Blindfolded LLMs Still Trade? An Anonymization-First Framework for Portfolio Optimization (2026-03-19)
arXiv:2603.17692v1 Announce Type: cross Abstract: For LLM trading agents to be genuinely trustworthy, they must demonstrate understanding of market d...
🇺🇸 Facts as First Class Objects: Knowledge Objects for Persistent LLM Memory (2026-03-19)
arXiv:2603.17781v1 Announce Type: new Abstract: Large language models increasingly serve as persistent knowledge workers, with in-context memory - fa...
🇺🇸 Evaluating Ill-Defined Tasks in Large Language Models (2026-03-19)
arXiv:2603.17067v1 Announce Type: cross Abstract: Many evaluations of Large Language Models (LLMs) target tasks that are inherently ill-defined, with...
🇺🇸 Knowledge Localization in Mixture-of-Experts LLMs Using Cross-Lingual Inconsistency (2026-03-19)
arXiv:2603.17102v1 Announce Type: cross Abstract: Modern LLMs continue to exhibit significant variance in behavior across languages, such as being ab...
🇺🇸 Anonymous-by-Construction: An LLM-Driven Framework for Privacy-Preserving Text (2026-03-19)
arXiv:2603.17217v1 Announce Type: cross Abstract: Responsible use of AI demands that we protect sensitive information without undermining the usefuln...
🇺🇸 Deployment and Evaluation of an EHR-integrated, Large Language Model-Powered Tool to Triage Surgical Patients (2026-03-19)
arXiv:2603.17234v1 Announce Type: cross Abstract: Surgical co-management (SCM) is an evidence-based model in which hospitalists jointly manage medica...
🇺🇸 How do LLMs Compute Verbal Confidence (2026-03-19)
arXiv:2603.17839v1 Announce Type: cross Abstract: Verbal confidence -- prompting LLMs to state their confidence as a number or category -- is widely ...
🇺🇸 InPhyRe Discovers: Large Multimodal Models Struggle in Inductive Physical Reasoning (2026-03-19)
arXiv:2509.12263v2 Announce Type: replace Abstract: Large multimodal models (LMMs) encode physical laws observed during training, such as momentum co...
🇺🇸 Adaptive Theory of Mind for LLM-based Multi-Agent Coordination (2026-03-18)
arXiv:2603.16264v1 Announce Type: new Abstract: Theory of Mind (ToM) refers to the ability to reason about others' mental states, and higher-order To...
🇺🇸 BenchPreS: A Benchmark for Context-Aware Personalized Preference Selectivity of Persistent-Memory LLMs (2026-03-18)
arXiv:2603.16557v1 Announce Type: new Abstract: Large language models (LLMs) increasingly store user preferences in persistent memory to support pers...
🇺🇸 Designing for Disagreement: Front-End Guardrails for Assistance Allocation in LLM-Enabled Robots (2026-03-18)
arXiv:2603.16537v1 Announce Type: new Abstract: LLM-enabled robots prioritizing scarce assistance in social settings face pluralistic values and LLM ...
🇺🇸 This Is Taking Too Long -- Investigating Time as a Proxy for Energy Consumption of LLMs (2026-03-18)
arXiv:2603.15699v1 Announce Type: cross Abstract: The energy consumption of Large Language Models (LLMs) is raising growing concerns due to their adv...
🇺🇸 Data-Local Autonomous LLM-Guided Neural Architecture Search for Multiclass Multimodal Time-Series Classification (2026-03-18)
arXiv:2603.15939v1 Announce Type: cross Abstract: Applying machine learning to sensitive time-series data is often bottlenecked by the iteration loop...
🇺🇸 Understanding Moral Reasoning Trajectories in Large Language Models: Toward Probing-Based Explainability (2026-03-18)
arXiv:2603.16017v1 Announce Type: cross Abstract: Large language models (LLMs) increasingly participate in morally sensitive decision-making, yet how...
🇺🇸 A Human-Centred Architecture for Large Language Models-Cognitive Assistants in Manufacturing within Quality Management Systems (2026-03-18)
arXiv:2603.16325v1 Announce Type: cross Abstract: Large Language Models-Cognitive Assistants (LLM-CAs) can enhance Quality Management Systems (QMS) i...
🇺🇸 Detecting Sentiment Steering Attacks on RAG-enabled Large Language Models (2026-03-18)
arXiv:2603.16342v1 Announce Type: cross Abstract: The proliferation of large-scale IoT networks has been both a blessing and a curse. Not only has it...
🇺🇸 Is Conformal Factuality for RAG-based LLMs Robust? Novel Metrics and Systematic Insights (2026-03-18)
arXiv:2603.16817v1 Announce Type: new Abstract: Large language models (LLMs) frequently hallucinate, limiting their reliability in knowledge-intensiv...
🇺🇸 Prompt Programming for Cultural Bias and Alignment of Large Language Models (2026-03-18)
arXiv:2603.16827v1 Announce Type: new Abstract: Culture shapes reasoning, values, prioritization, and strategic decision-making, yet large language m...
🇺🇸 Dynamic Theory of Mind as a Temporal Memory Problem: Evidence from Large Language Models (2026-03-17)
arXiv:2603.14646v1 Announce Type: new Abstract: Theory of Mind (ToM) is central to social cognition and human-AI interaction, and Large Language Mode...
🇺🇸 Learning to Forget: Sleep-Inspired Memory Consolidation for Resolving Proactive Interference in Large Language Models (2026-03-17)
arXiv:2603.14517v1 Announce Type: new Abstract: Large language models (LLMs) suffer from proactive interference (PI): outdated information in the con...
🇺🇸 OpenHospital: A Thing-in-itself Arena for Evolving and Benchmarking LLM-based Collective Intelligence (2026-03-17)
arXiv:2603.14771v1 Announce Type: new Abstract: Large Language Model (LLM)-based Collective Intelligence (CI) presents a promising approach to overco...
🇺🇸 BrainBench: Exposing the Commonsense Reasoning Gap in Large Language Models (2026-03-17)
arXiv:2603.14761v1 Announce Type: new Abstract: Large language models (LLMs) achieve impressive scores on standard benchmarks yet routinely fail ques...
🇺🇸 GameUIAgent: An LLM-Powered Framework for Automated Game UI Design with Structured Intermediate Representation (2026-03-17)
arXiv:2603.14724v1 Announce Type: new Abstract: Game UI design requires consistent visual assets across rarity tiers yet remains a predominantly manu...
🇺🇸 Why the Valuable Capabilities of LLMs Are Precisely the Unexplainable Ones (2026-03-17)
arXiv:2603.15238v1 Announce Type: new Abstract: This paper proposes and argues for a counterintuitive thesis: the truly valuable capabilities of larg...
🇺🇸 GPrune-LLM: Generalization-Aware Structured Pruning for Large Language Models (2026-03-17)
arXiv:2603.13418v1 Announce Type: cross Abstract: Structured pruning is widely used to compress large language models (LLMs), yet its effectiveness d...
🇺🇸 LLM-MINE: Large Language Model based Alzheimer's Disease and Related Dementias Phenotypes Mining from Clinical Notes (2026-03-17)
arXiv:2603.13673v1 Announce Type: new Abstract: Accurate extraction of Alzheimer's Disease and Related Dementias (ADRD) phenotypes from electronic he...
🇺🇸 Supervised Fine-Tuning versus Reinforcement Learning: A Study of Post-Training Methods for Large Language Models (2026-03-17)
arXiv:2603.13985v1 Announce Type: new Abstract: Pre-trained Large Language Model (LLM) exhibits broad capabilities, yet, for specific tasks or domain...
🇺🇸 Why Do LLM-based Web Agents Fail? A Hierarchical Planning Perspective (2026-03-17)
arXiv:2603.14248v1 Announce Type: new Abstract: Large language model (LLM) web agents are increasingly used for web navigation but remain far from hu...
🇺🇸 Argumentation for Explainable and Globally Contestable Decision Support with LLMs (2026-03-17)
arXiv:2603.14643v1 Announce Type: new Abstract: Large language models (LLMs) exhibit strong general capabilities, but their deployment in high-stakes...
🇺🇸 Continual Learning in Large Language Models: Methods, Challenges, and Opportunities (2026-03-16)
arXiv:2603.12658v1 Announce Type: cross Abstract: Continual learning (CL) has emerged as a pivotal paradigm to enable large language models (LLMs) to...
🇺🇸 OffTopicEval: When Large Language Models Enter the Wrong Chat, Almost Always! (2026-03-16)
arXiv:2509.26495v3 Announce Type: replace Abstract: Large Language Model (LLM) safety is one of the most pressing challenges for enabling wide-scale ...
🇺🇸 Do LLMs Share Human-Like Biases? Causal Reasoning Under Prior Knowledge, Irrelevant Context, and Varying Compute Budgets (2026-03-16)
arXiv:2602.02983v2 Announce Type: replace Abstract: Large language models (LLMs) are increasingly used in domains where causal reasoning matters, yet...
🇺🇸 Do LLMs have a Gender (Entropy) Bias? (2026-03-16)
arXiv:2505.20343v2 Announce Type: replace-cross Abstract: We investigate the existence and persistence of a specific type of gender bias in some of t...
🇺🇸 Diagnosing Retrieval Bias Under Multiple In-Context Knowledge Updates in Large Language Models (2026-03-16)
arXiv:2603.12271v1 Announce Type: cross Abstract: LLMs are widely used in knowledge-intensive tasks where the same fact may be revised multiple times...
🇺🇸 TaoBench: Do Automated Theorem Prover LLMs Generalize Beyond MathLib? (2026-03-16)
arXiv:2603.12744v1 Announce Type: cross Abstract: Automated theorem proving (ATP) benchmarks largely consist of problems formalized in MathLib, so cu...
🇺🇸 Human-Centered Evaluation of an LLM-Based Process Modeling Copilot: A Mixed-Methods Study with Domain Experts (2026-03-16)
arXiv:2603.12895v1 Announce Type: cross Abstract: Integrating Large Language Models (LLMs) into business process management tools promises to democra...
🇺🇸 Aligning Large Language Model Agents with Rational and Moral Preferences: A Supervised Fine-Tuning Approach (2026-03-16)
arXiv:2507.20796v2 Announce Type: replace-cross Abstract: As large language models (LLMs) increasingly act as autonomous agents in markets and organi...
🇺🇸 LLM-Augmented Digital Twin for Policy Evaluation in Short-Video Platforms (2026-03-13)
arXiv:2603.11333v1 Announce Type: new Abstract: Short-video platforms are closed-loop, human-in-the-loop ecosystems where platform policy, creator in...
🇺🇸 Examining Reasoning LLMs-as-Judges in Non-Verifiable LLM Post-Training (2026-03-13)
arXiv:2603.12246v1 Announce Type: new Abstract: Reasoning LLMs-as-Judges, which can benefit from inference-time scaling, provide a promising path for...
🇺🇸 From Entity-Centric to Goal-Oriented Graphs: Enhancing LLM Knowledge Retrieval in Minecraft (2026-03-13)
arXiv:2505.18607v2 Announce Type: replace Abstract: Large Language Models (LLMs) demonstrate impressive general capabilities but often struggle with ...
🇺🇸 Exploring Collatz Dynamics with Human-LLM Collaboration (2026-03-13)
arXiv:2603.11066v1 Announce Type: cross Abstract: We investigate structural properties of the Collatz iteration through two phenomena observed in lar...
🇺🇸 TopoBench: Benchmarking LLMs on Hard Topological Reasoning (2026-03-13)
arXiv:2603.12133v1 Announce Type: new Abstract: Solving topological grid puzzles requires reasoning over global spatial invariants such as connectivi...
🇺🇸 OpenSanctions Pairs: Large-Scale Entity Matching with LLMs (2026-03-13)
arXiv:2603.11051v1 Announce Type: cross Abstract: We release OpenSanctions Pairs, a large-scale entity matching benchmark derived from real-world int...
🇺🇸 AI Psychometrics: Evaluating the Psychological Reasoning of Large Language Models with Psychometric Validities (2026-03-13)
arXiv:2603.11279v1 Announce Type: new Abstract: The immense number of parameters and deep neural networks make large language models (LLMs) rival the...
🇺🇸 WebWeaver: Breaking Topology Confidentiality in LLM Multi-Agent Systems with Stealthy Context-Based Inference (2026-03-13)
arXiv:2603.11132v1 Announce Type: cross Abstract: Communication topology is a critical factor in the utility and safety of LLM-based multi-agent syst...
🇺🇸 Evaluation and LLM-Guided Learning of ICD Coding Rationales (2026-03-13)
arXiv:2508.16777v2 Announce Type: replace Abstract: ICD coding is the process of mapping unstructured text from Electronic Health Records (EHRs) to s...
🇺🇸 Summarize Before You Speak with ARACH: A Training-Free Inference-Time Plug-In for Enhancing LLMs via Global Attention Reallocation (2026-03-13)
arXiv:2603.11067v1 Announce Type: cross Abstract: Large language models (LLMs) achieve remarkable performance, yet further gains often require costly...
🇺🇸 RewardHackingAgents: Benchmarking Evaluation Integrity for LLM ML-Engineering Agents (2026-03-13)
arXiv:2603.11337v1 Announce Type: new Abstract: LLM agents increasingly perform end-to-end ML engineering tasks where success is judged by a single s...
🇺🇸 Improving LLM Performance Through Black-Box Online Tuning: A Case for Adding System Specs to Factsheets for Trusted AI (2026-03-13)
arXiv:2603.11340v1 Announce Type: new Abstract: In this paper, we present a novel black-box online controller that uses only end-to-end measurements ...
🇺🇸 AI Knows What's Wrong But Cannot Fix It: Helicoid Dynamics in Frontier LLMs Under High-Stakes Decisions (2026-03-13)
arXiv:2603.11559v1 Announce Type: new Abstract: Large language models perform reliably when their outputs can be checked: solving equations, writing ...
🇺🇸 Markovian Generation Chains in Large Language Models (2026-03-13)
arXiv:2603.11228v1 Announce Type: cross Abstract: The widespread use of large language models (LLMs) raises an important question: how do texts evolv...
🇺🇸 Jailbreak Scaling Laws for Large Language Models: Polynomial-Exponential Crossover (2026-03-13)
arXiv:2603.11331v1 Announce Type: cross Abstract: Adversarial attacks can reliably steer safety-aligned large language models toward unsafe behavior....
🇺🇸 UtilityMax Prompting: A Formal Framework for Multi-Objective Large Language Model Optimization (2026-03-13)
arXiv:2603.11583v1 Announce Type: cross Abstract: The success of a Large Language Model (LLM) task depends heavily on its prompt. Most use-cases spec...
🇺🇸 MobileKernelBench: Can LLMs Write Efficient Kernels for Mobile Devices? (2026-03-13)
arXiv:2603.11935v1 Announce Type: cross Abstract: Large language models (LLMs) have demonstrated remarkable capabilities in code generation, yet thei...
🇺🇸 BTZSC: A Benchmark for Zero-Shot Text Classification Across Cross-Encoders, Embedding Models, Rerankers and LLMs (2026-03-13)
arXiv:2603.11991v1 Announce Type: cross Abstract: Zero-shot text classification (ZSC) offers the promise of eliminating costly task-specific annotati...
🇺🇸 Resource-Efficient Iterative LLM-Based NAS with Feedback Memory (2026-03-13)
arXiv:2603.12091v1 Announce Type: cross Abstract: Neural Architecture Search (NAS) automates network design, but conventional methods demand substant...
🇺🇸 LLMs can construct powerful representations and streamline sample-efficient supervised learning (2026-03-13)
arXiv:2603.11679v1 Announce Type: new Abstract: As real-world datasets become increasingly complex and heterogeneous, supervised learning is often bo...
🇺🇸 IsoCompute Playbook: Optimally Scaling Sampling Compute for LLM RL (2026-03-13)
arXiv:2603.12151v1 Announce Type: cross Abstract: While scaling laws guide compute allocation for LLM pre-training, analogous prescriptions for reinf...
🇺🇸 DeliberationBench: A Normative Benchmark for the Influence of Large Language Models on Users' Views (2026-03-12)
arXiv:2603.10018v1 Announce Type: cross Abstract: As large language models (LLMs) become pervasive as assistants and thought partners, it is importan...
🇺🇸 Amnesia: Adversarial Semantic Layer Specific Activation Steering in Large Language Models (2026-03-12)
arXiv:2603.10080v1 Announce Type: cross Abstract: Warning: This article includes red-teaming experiments, which contain examples of compromised LLM r...
🇺🇸 Code-Space Response Oracles: Generating Interpretable Multi-Agent Policies with Large Language Models (2026-03-12)
arXiv:2603.10098v1 Announce Type: cross Abstract: Recent advances in multi-agent reinforcement learning, particularly Policy-Space Response Oracles (...
🇺🇸 Utility Function is All You Need: LLM-based Congestion Control (2026-03-12)
arXiv:2603.10357v1 Announce Type: cross Abstract: Congestion is a critical and challenging problem in communication networks. Congestion control prot...
🇺🇸 Risk-Adjusted Harm Scoring for Automated Red Teaming for LLMs in Financial Services (2026-03-12)
arXiv:2603.10807v1 Announce Type: cross Abstract: The rapid adoption of large language models (LLMs) in financial services introduces new operational...
🇺🇸 When Fine-Tuning Fails and when it Generalises: Role of Data Diversity and Mixed Training in LLM-based TTS (2026-03-12)
arXiv:2603.10904v1 Announce Type: cross Abstract: Large language models are increasingly adopted as semantic backbones for neural text-to-speech syst...
🇺🇸 BiasBusters: Uncovering and Mitigating Tool Selection Bias in Large Language Models (2026-03-12)
arXiv:2510.00307v2 Announce Type: replace Abstract: Agents backed by large language models (LLMs) increasingly rely on external tools drawn from mark...
🇺🇸 Adaptive Activation Cancellation for Hallucination Mitigation in Large Language Models (2026-03-12)
arXiv:2603.10195v1 Announce Type: cross Abstract: Large Language Models frequently generate fluent but factually incorrect text. We propose Adaptive ...
🇺🇸 DUCTILE: Agentic LLM Orchestration of Engineering Analysis in Product Development Practice (2026-03-12)
arXiv:2603.10249v1 Announce Type: cross Abstract: Engineering analysis automation in product development relies on rigid interfaces between tools, da...
🇺🇸 RADAR: Reasoning-Ability and Difficulty-Aware Routing for Reasoning LLMs (2026-03-12)
arXiv:2509.25426v3 Announce Type: replace Abstract: Reasoning language models have demonstrated remarkable performance on many challenging tasks in m...
🇺🇸 EsoLang-Bench: Evaluating Genuine Reasoning in Large Language Models via Esoteric Programming Languages (2026-03-11)
arXiv:2603.09678v1 Announce Type: new Abstract: Large language models achieve near-ceiling performance on code generation benchmarks, yet these resul...
🇺🇸 Influencing LLM Multi-Agent Dialogue via Policy-Parameterized Prompts (2026-03-11)
arXiv:2603.09890v1 Announce Type: new Abstract: Large Language Models (LLMs) have emerged as a new paradigm for multi-agent systems. However, existin...
🇺🇸 DuplexCascade: Full-Duplex Speech-to-Speech Dialogue with VAD-Free Cascaded ASR-LLM-TTS Pipeline and Micro-Turn Optimization (2026-03-11)
arXiv:2603.09180v1 Announce Type: cross Abstract: Spoken dialog systems with cascaded ASR-LLM-TTS modules retain strong LLM intelligence, but VAD seg...
🇺🇸 Beyond Scaling: Assessing Strategic Reasoning and Rapid Decision-Making Capability of LLMs in Zero-sum Environments (2026-03-11)
arXiv:2603.09337v1 Announce Type: cross Abstract: Large Language Models (LLMs) have achieved strong performance on static reasoning benchmarks, yet t...
🇺🇸 Emotion is Not Just a Label: Latent Emotional Factors in LLM Processing (2026-03-11)
arXiv:2603.09205v1 Announce Type: cross Abstract: Large language models are routinely deployed on text that varies widely in emotional tone, yet thei...
🇺🇸 Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health (2026-03-11)
arXiv:2603.09416v1 Announce Type: cross Abstract: Large Language Models (LLMs) excel in Natural Language Processing (NLP) tasks, but they often propa...
🇺🇸 Alignment Is the Disease: Censorship Visibility and Alignment Constraint Complexity as Determinants of Collective Pathology in Multi-Agent LLM Systems (2026-03-11)
arXiv:2603.08723v1 Announce Type: cross Abstract: Alignment techniques in large language models (LLMs) are designed to constrain model outputs toward...
🇺🇸 Large Language Model-Assisted Superconducting Qubit Experiments (2026-03-11)
arXiv:2603.08801v1 Announce Type: cross Abstract: Superconducting circuits have demonstrated significant potential in quantum information processing ...
🇺🇸 Understanding the Use of a Large Language Model-Powered Guide to Make Virtual Reality Accessible for Blind and Low Vision People (2026-03-11)
arXiv:2603.09964v1 Announce Type: cross Abstract: As social virtual reality (VR) grows more popular, addressing accessibility for blind and low visio...
🇺🇸 Improving instruction hierarchy in frontier LLMs (2026-03-10)
IH-Challenge trains models to prioritize trusted instructions, improving instruction hierarchy, safety steerability, and resistance to prompt injectio...
🇺🇸 FinSheet-Bench: From Simple Lookups to Complex Reasoning, Where LLMs Break on Financial Spreadsheets (2026-03-10)
arXiv:2603.07316v1 Announce Type: new Abstract: While Large Language Models (LLMs) can accelerate text-heavy tasks in alternative investment due dili...
🇺🇸 In-Context Reinforcement Learning for Tool Use in Large Language Models (2026-03-10)
arXiv:2603.08068v1 Announce Type: new Abstract: While large language models (LLMs) exhibit strong reasoning abilities, their performance on complex t...
🇺🇸 Consensus is Not Verification: Why Crowd Wisdom Strategies Fail for LLM Truthfulness (2026-03-10)
arXiv:2603.06612v1 Announce Type: cross Abstract: Pass@k and other methods of scaling inference compute can improve language model performance in dom...
🇺🇸 RACER: Risk-Aware Calibrated Efficient Routing for Large Language Models (2026-03-10)
arXiv:2603.06616v1 Announce Type: cross Abstract: Efficiently routing queries to the optimal large language model (LLM) is crucial for optimizing the...
🇺🇸 A Novel Multi-Agent Architecture to Reduce Hallucinations of Large Language Models in Multi-Step Structural Modeling (2026-03-10)
arXiv:2603.07728v1 Announce Type: new Abstract: Large language models (LLMs) such as GPT and Gemini have demonstrated remarkable capabilities in cont...
🇺🇸 Rigidity in LLM Bandits with Implications for Human-AI Dyads (2026-03-10)
arXiv:2603.07717v1 Announce Type: new Abstract: We test whether LLMs show robust decision biases. Treating models as participants in two-arm bandits,...
🇺🇸 Grounding Machine Creativity in Game Design Knowledge Representations: Empirical Probing of LLM-Based Executable Synthesis of Goal Playable Patterns under Structural Constraints (2026-03-10)
arXiv:2603.07101v1 Announce Type: new Abstract: Creatively translating complex gameplay ideas into executable artifacts (e.g., games as Unity project...
🇺🇸 Evaluating Financial Intelligence in Large Language Models: Benchmarking SuperInvesting AI with LLM Engines (2026-03-10)
arXiv:2603.08704v1 Announce Type: new Abstract: Large language models are increasingly used for financial analysis and investment research, yet syste...
🇺🇸 SmartBench: Evaluating LLMs in Smart Homes with Anomalous Device States and Behavioral Contexts (2026-03-10)
arXiv:2603.06636v1 Announce Type: cross Abstract: Due to the strong context-awareness capabilities demonstrated by large language models (LLMs), rece...
🇺🇸 Enhancing Instruction Following of LLMs via Activation Steering with Dynamic Rejection (2026-03-10)
arXiv:2603.06745v1 Announce Type: cross Abstract: Large Language Models (LLMs), despite advances in instruction tuning, often fail to follow complex ...
🇺🇸 Not Too Short, Not Too Long: How LLM Response Length Shapes People's Critical Thinking in Error Detection (2026-03-10)
arXiv:2603.06878v1 Announce Type: new Abstract: Large language models (LLMs) have become common decision-support tools across educational and profess...
🇺🇸 $\textbf{Re}^{2}$: Unlocking LLM Reasoning via Reinforcement Learning with Re-solving (2026-03-10)
arXiv:2603.07197v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) has shown promise in enhancing the reasoning pe...
🇺🇸 Large Language Model for Discrete Optimization Problems: Evaluation and Step-by-step Reasoning (2026-03-10)
arXiv:2603.07733v1 Announce Type: new Abstract: This work investigated the capabilities of different models, including the Llama-3 series of models a...
🇺🇸 Advancing Automated Algorithm Design via Evolutionary Stagewise Design with LLMs (2026-03-10)
arXiv:2603.07970v1 Announce Type: new Abstract: With the rapid advancement of human science and technology, problems in industrial scenarios are beco...
🇺🇸 HEARTS: Benchmarking LLM Reasoning on Health Time Series (2026-03-10)
arXiv:2603.06638v1 Announce Type: cross Abstract: The rise of large language models (LLMs) has shifted time series analysis from narrow analytics to ...
🇺🇸 Performance Comparison of IBN orchestration using LLM and SLMs (2026-03-10)
arXiv:2603.06647v1 Announce Type: cross Abstract: The evolution of both 5G and 6G networks is driving the advancement of fully autonomous network man...
🇺🇸 Supporting Artifact Evaluation with LLMs: A Study with Published Security Research Papers (2026-03-10)
arXiv:2603.06862v1 Announce Type: cross Abstract: Artifact Evaluation (AE) is essential for ensuring the transparency and reliability of research, cl...
🇺🇸 Knowing without Acting: The Disentangled Geometry of Safety Mechanisms in Large Language Models (2026-03-09)
arXiv:2603.05773v1 Announce Type: cross Abstract: Safety alignment is often conceptualized as a monolithic process wherein harmfulness detection auto...
🇺🇸 Ambiguity Collapse by LLMs: A Taxonomy of Epistemic Risks (2026-03-09)
arXiv:2603.05801v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly used to make sense of ambiguous, open-textured, value...
🇺🇸 Abductive Reasoning with Syllogistic Forms in Large Language Models (2026-03-09)
arXiv:2603.06428v1 Announce Type: cross Abstract: Research in AI using Large-Language Models (LLMs) is rapidly evolving, and the comparison of their ...
🇺🇸 Agentic LLM Planning via Step-Wise PDDL Simulation: An Empirical Characterisation (2026-03-09)
arXiv:2603.06064v1 Announce Type: new Abstract: Task planning, the problem of sequencing actions to reach a goal from an initial state, is a core cap...
🇺🇸 Who We Are, Where We Are: Mental Health at the Intersection of Person, Situation, and Large Language Models (2026-03-09)
arXiv:2603.05953v1 Announce Type: cross Abstract: Mental health is not a fixed trait but a dynamic process shaped by the interplay between individual...
🇺🇸 MoEless: Efficient MoE LLM Serving via Serverless Computing (2026-03-09)
arXiv:2603.06350v1 Announce Type: cross Abstract: Large Language Models (LLMs) have become a cornerstone of AI, driving progress across diverse domai...
🇺🇸 Structured Exploration vs. Generative Flexibility: A Field Study Comparing Bandit and LLM Architectures for Personalised Health Behaviour Interventions (2026-03-09)
arXiv:2603.06330v1 Announce Type: cross Abstract: Behaviour Change Techniques (BCTs) are central to digital health interventions, yet selecting and d...
🇺🇸 CRIMSON: A Clinically-Grounded LLM-Based Metric for Generative Radiology Report Evaluation (2026-03-09)
arXiv:2603.06183v1 Announce Type: cross Abstract: We introduce CRIMSON, a clinically grounded evaluation framework for chest X-ray report generation ...
🇺🇸 PVminerLLM: Structured Extraction of Patient Voice from Patient-Generated Text using Large Language Models (2026-03-09)
arXiv:2603.05776v1 Announce Type: cross Abstract: Motivation: Patient-generated text contains critical information about patients' lived experiences,...
🇺🇸 Can LLM Aid in Solving Constraints with Inductive Definitions? (2026-03-09)
arXiv:2603.03668v1 Announce Type: cross Abstract: Solving constraints involving inductive (aka recursive) definitions is challenging. State-of-the-ar...
🇺🇸 Lost in Stories: Consistency Bugs in Long Story Generation by LLMs (2026-03-09)
arXiv:2603.05890v1 Announce Type: cross Abstract: What happens when a storyteller forgets its own story? Large Language Models (LLMs) can now generat...
🇺🇸 LUMINA: LLM-Guided GPU Architecture Exploration via Bottleneck Analysis (2026-03-09)
arXiv:2603.05904v1 Announce Type: cross Abstract: GPU design space exploration (DSE) for modern AI workloads, such as Large-Language Model (LLM) infe...
🇺🇸 MASFactory: A Graph-centric Framework for Orchestrating LLM-Based Multi-Agent Systems with Vibe Graphing (2026-03-09)
arXiv:2603.06007v1 Announce Type: cross Abstract: Large language model-based (LLM-based) multi-agent systems (MAS) are increasingly used to extend ag...
🇺🇸 COLD-Steer: Steering Large Language Models via In-Context One-step Learning Dynamics (2026-03-09)
arXiv:2603.06495v1 Announce Type: cross Abstract: Activation steering methods enable inference-time control of large language model (LLM) behavior wi...
🇺🇸 VisioMath: Benchmarking Figure-based Mathematical Reasoning in LMMs (2026-03-09)
arXiv:2506.06727v4 Announce Type: replace Abstract: Large Multimodal Models have achieved remarkable progress in integrating vision and language, ena...
🇺🇸 Transforming Agency. On the mode of existence of Large Language Models (2026-03-09)
arXiv:2407.10735v3 Announce Type: replace Abstract: This paper investigates the ontological characterization of Large Language Models (LLMs) like Cha...
🇺🇸 Localizing and Correcting Errors for LLM-based Planners (2026-03-09)
arXiv:2602.00276v2 Announce Type: replace Abstract: Large language models (LLMs) have demonstrated strong reasoning capabilities on math and coding, ...
🇺🇸 Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation (2026-03-09)
arXiv:2502.05151v3 Announce Type: replace-cross Abstract: With the advent of large multimodal language models, science is now at a threshold of an AI...
🇺🇸 Evaluating Austrian A-Level German Essays with Large Language Models for Automated Essay Scoring (2026-03-09)
arXiv:2603.06066v1 Announce Type: cross Abstract: Automated Essay Scoring (AES) has been explored for decades with the goal to support teachers by re...
🇺🇸 Experiences Build Characters: The Linguistic Origins and Functional Impact of LLM Personality (2026-03-09)
arXiv:2603.06088v1 Announce Type: cross Abstract: Human problem-solving is enriched by a diversity of styles and personality traits, yet the developm...
🇺🇸 Algorithmic Collusion by Large Language Models (2026-03-09)
arXiv:2404.00806v5 Announce Type: replace-cross Abstract: We conduct experiments with algorithmic pricing agents based on Large Language Models (LLMs...
🇺🇸 From Tokenizer Bias to Backbone Capability: A Controlled Study of LLMs for Time Series Forecasting (2026-03-09)
arXiv:2504.08818v2 Announce Type: replace-cross Abstract: Using pre-trained large language models (LLMs) as a backbone for time series prediction has...
🇺🇸 When Agents Persuade: Propaganda Generation and Mitigation in LLMs (2026-03-06)
arXiv:2603.04636v1 Announce Type: new Abstract: Despite their wide-ranging benefits, LLM-based agents deployed in open environments can be exploited ...
🇺🇸 Design Behaviour Codes (DBCs): A Taxonomy-Driven Layered Governance Benchmark for Large Language Models (2026-03-06)
arXiv:2603.04837v1 Announce Type: new Abstract: We introduce the Dynamic Behavioral Constraint (DBC) benchmark, the first empirical framework for eva...
🇺🇸 LLM-Grounded Explainability for Port Congestion Prediction via Temporal Graph Attention Networks (2026-03-06)
arXiv:2603.04818v1 Announce Type: new Abstract: Port congestion at major maritime hubs disrupts global supply chains, yet existing prediction systems...
🇺🇸 Alignment Backfire: Language-Dependent Reversal of Safety Interventions Across 16 Languages in LLM Multi-Agent Systems (2026-03-06)
arXiv:2603.04904v1 Announce Type: new Abstract: In perpetrator treatment, a recurring observation is the dissociation between insight and action: off...
🇺🇸 Survive at All Costs: Exploring LLM's Risky Behaviors under Survival Pressure (2026-03-06)
arXiv:2603.05028v1 Announce Type: new Abstract: As Large Language Models (LLMs) evolve from chatbots to agentic assistants, they are increasingly obs...
🇺🇸 X-RAY: Mapping LLM Reasoning Capability via Formalized and Calibrated Probes (2026-03-06)
arXiv:2603.05290v1 Announce Type: new Abstract: Large language models (LLMs) achieve promising performance, yet their ability to reason remains poorl...
🇺🇸 Legal interpretation and AI: from expert systems to argumentation and LLMs (2026-03-06)
arXiv:2603.05392v1 Announce Type: new Abstract: AI and Law research has encountered legal interpretation in different ways, in the context of its evo...
🇺🇸 Detection of Illicit Content on Online Marketplaces using Large Language Models (2026-03-06)
arXiv:2603.04707v1 Announce Type: cross Abstract: Online marketplaces, while revolutionizing global commerce, have inadvertently facilitated the prol...
🇺🇸 BandPO: Bridging Trust Regions and Ratio Clipping via Probability-Aware Bounds for LLM Reinforcement Learning (2026-03-06)
arXiv:2603.04918v1 Announce Type: cross Abstract: Proximal constraints are fundamental to the stability of the Large Language Model reinforcement lea...
🇺🇸 What Is Missing: Interpretable Ratings for Large Language Model Outputs (2026-03-06)
arXiv:2603.04429v1 Announce Type: cross Abstract: Current Large Language Model (LLM) preference learning methods such as Proximal Policy Optimization...
🇺🇸 A unified foundational framework for knowledge injection and evaluation of Large Language Models in Combustion Science (2026-03-06)
arXiv:2603.04452v1 Announce Type: cross Abstract: To advance foundation Large Language Models (LLMs) for combustion science, this study presents the ...
🇺🇸 Large Language Models as Bidding Agents in Repeated HetNet Auction (2026-03-06)
arXiv:2603.04455v1 Announce Type: cross Abstract: This paper investigates the integration of large language models (LLMs) as reasoning agents in repe...
🇺🇸 From Spark to Fire: Modeling and Mitigating Error Cascades in LLM-Based Multi-Agent Collaboration (2026-03-06)
arXiv:2603.04474v1 Announce Type: cross Abstract: Large Language Model-based Multi-Agent Systems (LLM-MAS) are increasingly applied to complex collab...
🇺🇸 Hate Speech Detection using Large Language Models with Data Augmentation and Feature Enhancement (2026-03-06)
arXiv:2603.04698v1 Announce Type: cross Abstract: This paper evaluates data augmentation and feature enhancement techniques for hate speech detection...
🇺🇸 When Weak LLMs Speak with Confidence, Preference Alignment Gets Stronger (2026-03-06)
arXiv:2603.04968v1 Announce Type: cross Abstract: Preference alignment is an essential step in adapting large language models (LLMs) to human values,...
🇺🇸 Yann LeCun’s new venture is a contrarian bet against large language models (2026-01-22)
Yann LeCun is a Turing Award recipient and a top AI researcher, but he has long been a contrarian figure in the tech world. He believes that the indus...
🇺🇸 “Dr. Google” had its issues. Can ChatGPT Health do better? (2026-01-22)
For the past two decades, there’s been a clear first step for anyone who starts experiencing new medical symptoms: Look them up online. The practice w...
🇺🇸 Multi-Sourced, Multi-Agent Evidence Retrieval for Fact-Checking (2026-03-03)
arXiv:2603.00267v1 Announce Type: new Abstract: Misinformation spreading over the Internet poses a significant threat to both societies and individua...
🇺🇸 An Agentic LLM Framework for Adverse Media Screening in AML Compliance (2026-03-02)
arXiv:2602.23373v1 Announce Type: new Abstract: Adverse media screening is a critical component of anti-money laundering (AML) and know-your-customer...
🇺🇸 Transformers converge to invariant algorithmic cores (2026-02-27)
arXiv:2602.22600v1 Announce Type: cross Abstract: Large language models exhibit sophisticated capabilities, yet understanding how they work internall...
🇺🇸 Ruyi2 Technical Report (2026-02-27)
arXiv:2602.22543v1 Announce Type: cross Abstract: Large Language Models (LLMs) face significant challenges regarding deployment costs and latency, ne...
🇺🇸 Generative Agents Navigating Digital Libraries (2026-02-27)
arXiv:2602.22529v1 Announce Type: cross Abstract: In the rapidly evolving field of digital libraries, the development of large language models (LLMs)...
🇺🇸 Reinforcement-aware Knowledge Distillation for LLM Reasoning (2026-02-27)
arXiv:2602.22495v1 Announce Type: cross Abstract: Reinforcement learning (RL) post-training has recently driven major gains in long chain-of-thought ...
🇺🇸 Sydney Telling Fables on AI and Humans: A Corpus Tracing Memetic Transfer of Persona between LLMs (2026-02-27)
arXiv:2602.22481v1 Announce Type: cross Abstract: The way LLM-based entities conceive of the relationship between AI and humans is an important topic...
🇺🇸 Automating the Detection of Requirement Dependencies Using Large Language Models (2026-02-27)
arXiv:2602.22456v1 Announce Type: cross Abstract: Requirements are inherently interconnected through various types of dependencies. Identifying these...
🇺🇸 Contextual Memory Virtualisation: DAG-Based State Management and Structurally Lossless Trimming for LLM Agents (2026-02-27)
arXiv:2602.22402v1 Announce Type: cross Abstract: As large language models engage in extended reasoning tasks, they accumulate significant state -- a...
🇺🇸 Scaling In, Not Up? Testing Thick Citation Context Analysis with GPT-5 and Fragile Prompts (2026-02-27)
arXiv:2602.22359v1 Announce Type: cross Abstract: This paper tests whether large language models (LLMs) can support interpretative citation context a...
🇺🇸 Structure and Redundancy in Large Language Models: A Spectral Study via Random Matrix Theory (2026-02-27)
arXiv:2602.22345v1 Announce Type: cross Abstract: This thesis addresses two persistent and closely related challenges in modern deep learning, reliab...
🇺🇸 UpSkill: Mutual Information Skill Learning for Structured Response Diversity in LLMs (2026-02-27)
arXiv:2602.22296v1 Announce Type: cross Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has improved the reasoning abilities of large...
🇺🇸 Manifold of Failure: Behavioral Attraction Basins in Language Models (2026-02-27)
arXiv:2602.22291v1 Announce Type: cross Abstract: While prior work has focused on projecting adversarial examples back onto the manifold of natural d...
🇺🇸 Integrating Machine Learning Ensembles and Large Language Models for Heart Disease Prediction Using Voting Fusion (2026-02-27)
arXiv:2602.22280v1 Announce Type: cross Abstract: Cardiovascular disease is the primary cause of death globally, necessitating early identification, ...
🇺🇸 Analysis of LLMs Against Prompt Injection and Jailbreak Attacks (2026-02-27)
arXiv:2602.22242v1 Announce Type: cross Abstract: Large Language Models (LLMs) are widely deployed in real-world systems. Given their broader applica...
🇺🇸 From Prompts to Performance: Evaluating LLMs for Task-based Parallel Code Generation (2026-02-27)
arXiv:2602.22240v1 Announce Type: cross Abstract: Large Language Models (LLM) show strong abilities in code generation, but their skill in creating e...
🇺🇸 Misinformation Exposure in the Chinese Web: A Cross-System Evaluation of Search Engines, LLMs, and AI Overviews (2026-02-27)
arXiv:2602.22221v1 Announce Type: cross Abstract: Large Language Models (LLMs) are increasingly integrated into search services, providing direct ans...
🇺🇸 Comparative Analysis of Neural Retriever-Reranker Pipelines for Retrieval-Augmented Generation over Knowledge Graphs in E-commerce Applications (2026-02-27)
arXiv:2602.22219v1 Announce Type: cross Abstract: Recent advancements in Large Language Models (LLMs) have transformed Natural Language Processing (N...
🇺🇸 Enriching Taxonomies Using Large Language Models (2026-02-27)
arXiv:2602.22213v1 Announce Type: cross Abstract: Taxonomies play a vital role in structuring and categorizing information across domains. However, m...
🇺🇸 Duel-Evolve: Reward-Free Test-Time Scaling via LLM Self-Preferences (2026-02-27)
arXiv:2602.21585v1 Announce Type: cross Abstract: Many applications seek to optimize LLM outputs at test time by iteratively proposing, scoring, and ...
🇺🇸 Toward Expert Investment Teams:A Multi-Agent LLM System with Fine-Grained Trading Tasks (2026-02-27)
arXiv:2602.23330v1 Announce Type: new Abstract: The advancement of large language models (LLMs) has accelerated the development of autonomous financi...
🇺🇸 LLM Novice Uplift on Dual-Use, In Silico Biology Tasks (2026-02-27)
arXiv:2602.23329v1 Announce Type: new Abstract: Large language models (LLMs) perform increasingly well on biology benchmarks, but it remains unclear ...
🇺🇸 Mitigating Legibility Tax with Decoupled Prover-Verifier Games (2026-02-27)
arXiv:2602.23248v1 Announce Type: new Abstract: As large language models become increasingly capable, it is critical that their outputs can be easily...
🇺🇸 SC-Arena: A Natural Language Benchmark for Single-Cell Reasoning with Knowledge-Augmented Evaluation (2026-02-27)
arXiv:2602.23199v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly applied in scientific research, offering new capabiliti...
🇺🇸 A Decision-Theoretic Formalisation of Steganography With Applications to LLM Monitoring (2026-02-27)
arXiv:2602.23163v1 Announce Type: new Abstract: Large language models are beginning to show steganographic capabilities. Such capabilities could allo...
🇺🇸 Enhancing CVRP Solver through LLM-driven Automatic Heuristic Design (2026-02-27)
arXiv:2602.23092v1 Announce Type: new Abstract: The Capacitated Vehicle Routing Problem (CVRP), a fundamental combinatorial optimization challenge, f...
🇺🇸 Obscure but Effective: Classical Chinese Jailbreak Prompt Optimization via Bio-Inspired Search (2026-02-27)
arXiv:2602.22983v1 Announce Type: new Abstract: As Large Language Models (LLMs) are increasingly used, their security risks have drawn increasing att...
🇺🇸 SPM-Bench: Benchmarking Large Language Models for Scanning Probe Microscopy (2026-02-27)
arXiv:2602.22971v1 Announce Type: new Abstract: As LLMs achieved breakthroughs in general reasoning, their proficiency in specialized scientific doma...
🇺🇸 General Agent Evaluation (2026-02-27)
arXiv:2602.22953v1 Announce Type: new Abstract: The promise of general-purpose agents - systems that perform tasks in unfamiliar environments without...
🇺🇸 Towards LLM-Empowered Knowledge Tracing via LLM-Student Hierarchical Behavior Alignment in Hyperbolic Space (2026-02-27)
arXiv:2602.22879v1 Announce Type: new Abstract: Knowledge Tracing (KT) diagnoses students' concept mastery through continuous learning state monitori...
🇺🇸 MiroFlow: Towards High-Performance and Robust Open-Source Agent Framework for General Deep Research Tasks (2026-02-27)
arXiv:2602.22808v1 Announce Type: new Abstract: Despite the remarkable progress of large language models (LLMs), the capabilities of standalone LLMs ...
🇺🇸 ClinDet-Bench: Beyond Abstention, Evaluating Judgment Determinability of LLMs in Clinical Decision-Making (2026-02-27)
arXiv:2602.22771v1 Announce Type: new Abstract: Clinical decisions are often required under incomplete information. Clinical experts must identify wh...
🇺🇸 AMA-Bench: Evaluating Long-Horizon Memory for Agentic Applications (2026-02-27)
arXiv:2602.22769v1 Announce Type: new Abstract: Large Language Models (LLMs) are deployed as autonomous agents in increasingly complex applications, ...
🇺🇸 RLHFless: Serverless Computing for Efficient RLHF (2026-02-27)
arXiv:2602.22718v1 Announce Type: new Abstract: Reinforcement Learning from Human Feedback (RLHF) has been widely applied to Large Language Model (LL...
🇺🇸 MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios (2026-02-27)
arXiv:2602.22638v1 Announce Type: new Abstract: Route-planning agents powered by large language models (LLMs) have emerged as a promising paradigm fo...
🇺🇸 SideQuest: Model-Driven KV Cache Management for Long-Horizon Agentic Reasoning (2026-02-27)
arXiv:2602.22603v1 Announce Type: new Abstract: Long-running agentic tasks, such as deep research, require multi-hop reasoning over information distr...
🇺🇸 Strategy Executability in Mathematical Reasoning: Leveraging Human-Model Differences for Effective Guidance (2026-02-27)
arXiv:2602.22583v1 Announce Type: new Abstract: Example-based guidance is widely used to improve mathematical reasoning at inference time, yet its ef...
🇺🇸 CourtGuard: A Model-Agnostic Framework for Zero-Shot Policy Adaptation in LLM Safety (2026-02-27)
arXiv:2602.22557v1 Announce Type: new Abstract: Current safety mechanisms for Large Language Models (LLMs) rely heavily on static, fine-tuned classif...
🇺🇸 Agentic AI for Intent-driven Optimization in Cell-free O-RAN (2026-02-27)
arXiv:2602.22539v1 Announce Type: new Abstract: Agentic artificial intelligence (AI) is emerging as a key enabler for autonomous radio access network...
🇺🇸 Cognitive Models and AI Algorithms Provide Templates for Designing Language Agents (2026-02-27)
arXiv:2602.22523v1 Announce Type: new Abstract: While contemporary large language models (LLMs) are increasingly capable in isolation, there are stil...
🇺🇸 Mapping the Landscape of Artificial Intelligence in Life Cycle Assessment Using Large Language Models (2026-02-27)
arXiv:2602.22500v1 Announce Type: new Abstract: Integration of artificial intelligence (AI) into life cycle assessment (LCA) has accelerated in recen...
🇺🇸 ConstraintBench: Benchmarking LLM Constraint Reasoning on Direct Optimization (2026-02-27)
arXiv:2602.22465v1 Announce Type: new Abstract: Large language models are increasingly applied to operational decision-making where the underlying st...
🇺🇸 A Framework for Assessing AI Agent Decisions and Outcomes in AutoML Pipelines (2026-02-27)
arXiv:2602.22442v1 Announce Type: new Abstract: Agent-based AutoML systems rely on large language models to make complex, multi-stage decisions acros...
🇺🇸 FIRE: A Comprehensive Benchmark for Financial Intelligence and Reasoning Evaluation (2026-02-27)
arXiv:2602.22273v1 Announce Type: new Abstract: We introduce FIRE, a comprehensive benchmark designed to evaluate both the theoretical financial know...
🇺🇸 Graph Your Way to Inspiration: Integrating Co-Author Graphs with Retrieval-Augmented Generation for Large Language Model Based Scientific Idea Generation (2026-02-27)
arXiv:2602.22215v1 Announce Type: new Abstract: Large Language Models (LLMs) demonstrate potential in the field of scientific idea generation. Howeve...
🇺🇸 LLMs Process Lists With General Filter Heads (2026-02-25)
arXiv:2510.26784v2 Announce Type: replace Abstract: We investigate the mechanisms underlying a range of list-processing tasks in LLMs, and we find th...
🇺🇸 Wireless Federated Multi-Task LLM Fine-Tuning via Sparse-and-Orthogonal LoRA (2026-02-25)
arXiv:2602.20492v1 Announce Type: cross Abstract: Decentralized federated learning (DFL) based on low-rank adaptation (LoRA) enables mobile devices w...
🇺🇸 What Makes a Good Query? Measuring the Impact of Human-Confusing Linguistic Features on LLM Performance (2026-02-25)
arXiv:2602.20300v1 Announce Type: cross Abstract: Large Language Model (LLM) hallucinations are usually treated as defects of the model or its decodi...
🇺🇸 Evaluating the Reliability of Digital Forensic Evidence Discovered by Large Language Model: A Case Study (2026-02-25)
arXiv:2602.20202v1 Announce Type: cross Abstract: The growing reliance on AI-identified digital evidence raises significant concerns about its reliab...
🇺🇸 Closing the Expertise Gap in Residential Building Energy Retrofits: A Domain-Specific LLM for Informed Decision-Making (2026-02-25)
arXiv:2602.20181v1 Announce Type: cross Abstract: Residential energy retrofit decision-making is constrained by an expertise gap, as homeowners lack ...
🇺🇸 Tool Building as a Path to "Superintelligence" (2026-02-25)
arXiv:2602.21061v1 Announce Type: new Abstract: The Diligent Learner framework suggests LLMs can achieve superintelligence via test-time search, prov...
🇺🇸 Physics-based phenomenological characterization of cross-modal bias in multimodal models (2026-02-25)
arXiv:2602.20624v1 Announce Type: new Abstract: The term 'algorithmic fairness' is used to evaluate whether AI models operate fairly in both comparat...
🇺🇸 A Problem-Oriented Perspective and Anchor Verification for Code Optimization (2026-02-25)
arXiv:2406.11935v3 Announce Type: replace-cross Abstract: Large Language Models (LLMs) have shown remarkable capabilities in solving various programm...
🇺🇸 From Moderation to Mediation: Can LLMs Serve as Mediators in Online Flame Wars? (2026-02-25)
arXiv:2512.03005v4 Announce Type: replace Abstract: The rapid advancement of large language models (LLMs) has opened new possibilities for AI for goo...
🇺🇸 "Don't Do That!": Guiding Embodied Systems through Large Language Model-based Constraint Generation (2026-02-25)
arXiv:2506.04500v2 Announce Type: replace Abstract: Recent advancements in large language models (LLMs) have spurred interest in robotic navigation t...
🇺🇸 Why Pass@k Optimization Can Degrade Pass@1: Prompt Interference in LLM Post-training (2026-02-25)
arXiv:2602.21189v1 Announce Type: cross Abstract: Pass@k is a widely used performance metric for verifiable large language model tasks, including mat...
🇺🇸 CAMEL: Confidence-Gated Reflection for Reward Modeling (2026-02-25)
arXiv:2602.20670v1 Announce Type: cross Abstract: Reward models play a fundamental role in aligning large language models with human preferences. Exi...
🇺🇸 CodeHacker: Automated Test Case Generation for Detecting Vulnerabilities in Competitive Programming Solutions (2026-02-25)
arXiv:2602.20213v1 Announce Type: cross Abstract: The evaluation of Large Language Models (LLMs) for code generation relies heavily on the quality an...
🇺🇸 MoBiQuant: Mixture-of-Bits Quantization for Token-Adaptive Elastic LLMs (2026-02-25)
arXiv:2602.20191v1 Announce Type: cross Abstract: Changing runtime complexity on cloud and edge devices necessitates elastic large language model (LL...
🇺🇸 Talking to Yourself: Defying Forgetting in Large Language Models (2026-02-25)
arXiv:2602.20162v1 Announce Type: cross Abstract: Catastrophic forgetting remains a major challenge when fine-tuning large language models (LLMs) on ...
🇺🇸 Architecting AgentOS: From Token-Level Context to Emergent System-Level Intelligence (2026-02-25)
arXiv:2602.20934v1 Announce Type: new Abstract: The paradigm of Large Language Models is undergoing a fundamental transition from static inference en...
🇺🇸 Buffer Matters: Unleashing the Power of Off-Policy Reinforcement Learning in Large Language Model Reasoning (2026-02-25)
arXiv:2602.20722v1 Announce Type: new Abstract: Traditional on-policy Reinforcement Learning with Verifiable Rewards (RLVR) frameworks suffer from ex...
🇺🇸 PromptCD: Test-Time Behavior Enhancement via Polarity-Prompt Contrastive Decoding (2026-02-25)
arXiv:2602.20696v1 Announce Type: new Abstract: Reliable AI systems require large language models (LLMs) to exhibit behaviors aligned with human pref...
🇺🇸 From Logs to Language: Learning Optimal Verbalization for LLM-Based Recommendation in Production (2026-02-25)
arXiv:2602.20558v1 Announce Type: new Abstract: Large language models (LLMs) are promising backbones for generative recommender systems, yet a key ch...
🇺🇸 DMCD: Semantic-Statistical Framework for Causal Discovery (2026-02-25)
arXiv:2602.20333v1 Announce Type: new Abstract: We present DMCD (DataMap Causal Discovery), a two-phase causal discovery framework that integrates LL...
🇺🇸 Diffusion Generative Recommendation with Continuous Tokens (2026-02-25)
arXiv:2504.12007v5 Announce Type: replace-cross Abstract: Recent advances in generative artificial intelligence, particularly large language models (...
🇺🇸 DS-STAR: Data Science Agent for Solving Diverse Tasks across Heterogeneous Formats and Open-Ended Queries (2026-02-25)
arXiv:2509.21825v4 Announce Type: replace Abstract: While large language models (LLMs) have shown promise in automating data science, existing agents...
🇺🇸 Programming by Backprop: An Instruction is Worth 100 Examples When Finetuning LLMs (2026-02-25)
arXiv:2506.18777v2 Announce Type: replace Abstract: Large language models (LLMs) are typically trained to acquire behaviours from demonstrations or e...
🇺🇸 Sensory-Motor Control with Large Language Models via Iterative Policy Refinement (2026-02-25)
arXiv:2506.04867v4 Announce Type: replace Abstract: We propose a method that enables large language models (LLMs) to control embodied agents through ...
🇺🇸 A Survey on the Optimization of Large Language Model-based Agents (2026-02-25)
arXiv:2503.12434v2 Announce Type: replace Abstract: With the rapid development of Large Language Models (LLMs), LLM-based agents have been widely ado...
🇺🇸 The Art of Efficient Reasoning: Data, Reward, and Optimization (2026-02-25)
arXiv:2602.20945v1 Announce Type: cross Abstract: Large Language Models (LLMs) consistently benefit from scaled Chain-of-Thought (CoT) reasoning, but...
🇺🇸 Hybrid LLM-Embedded Dialogue Agents for Learner Reflection: Designing Responsive and Theory-Driven Interactions (2026-02-25)
arXiv:2602.20486v1 Announce Type: cross Abstract: Dialogue systems have long supported learner reflections, with theoretically grounded, rule-based d...
🇺🇸 No One Size Fits All: QueryBandits for Hallucination Mitigation (2026-02-25)
arXiv:2602.20332v1 Announce Type: cross Abstract: Advanced reasoning capabilities in Large Language Models (LLMs) have led to more frequent hallucina...
🇺🇸 InterviewSim: A Scalable Framework for Interview-Grounded Personality Simulation (2026-02-25)
arXiv:2602.20294v1 Announce Type: cross Abstract: Simulating real personalities with large language models requires grounding generation in authentic...
🇺🇸 Golden Layers and Where to Find Them: Improved Knowledge Editing for Large Language Models Via Layer Gradient Analysis (2026-02-25)
arXiv:2602.20207v1 Announce Type: cross Abstract: Knowledge editing in Large Language Models (LLMs) aims to update the model's prediction for a speci...
🇺🇸 A Benchmark for Deep Information Synthesis (2026-02-25)
arXiv:2602.21143v1 Announce Type: new Abstract: Large language model (LLM)-based agents are increasingly used to solve complex tasks involving tool u...
🇺🇸 LogicGraph : Benchmarking Multi-Path Logical Reasoning via Neuro-Symbolic Generation and Verification (2026-02-25)
arXiv:2602.21044v1 Announce Type: new Abstract: Evaluations of large language models (LLMs) primarily emphasize convergent logical reasoning, where s...
🇺🇸 HELP: HyperNode Expansion and Logical Path-Guided Evidence Localization for Accurate and Efficient GraphRAG (2026-02-25)
arXiv:2602.20926v1 Announce Type: new Abstract: Large Language Models (LLMs) often struggle with inherent knowledge boundaries and hallucinations, li...
🇺🇸 Qwen-BIM: developing large language model for BIM-based design with domain-specific benchmark and dataset (2026-02-25)
arXiv:2602.20812v1 Announce Type: new Abstract: As the construction industry advances toward digital transformation, BIM (Building Information Modeli...
🇺🇸 Counterfactual Simulation Training for Chain-of-Thought Faithfulness (2026-02-25)
arXiv:2602.20710v1 Announce Type: new Abstract: Inspecting Chain-of-Thought reasoning is among the most common means of understanding why an LLM prod...
🇺🇸 Grounding LLMs in Scientific Discovery via Embodied Actions (2026-02-25)
arXiv:2602.20639v1 Announce Type: new Abstract: Large Language Models (LLMs) have shown significant potential in scientific discovery but struggle to...
🇺🇸 An artificial intelligence framework for end-to-end rare disease phenotyping from clinical notes using large language models (2026-02-25)
arXiv:2602.20324v1 Announce Type: new Abstract: Phenotyping is fundamental to rare disease diagnosis, but manual curation of structured phenotypes fr...
🇺🇸 SparkMe: Adaptive Semi-Structured Interviewing for Qualitative Insight Discovery (2026-02-25)
arXiv:2602.21136v1 Announce Type: cross Abstract: Qualitative insights from user experiences are critical for informing product and policy decisions,...
🇺🇸 El Agente Gr\'afico: Structured Execution Graphs for Scientific Agents (2026-02-23)
arXiv:2602.17902v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used to automate scientific workflows, yet their integr...
🇺🇸 The Token Games: Evaluating Language Model Reasoning with Puzzle Duels (2026-02-23)
arXiv:2602.17831v1 Announce Type: new Abstract: Evaluating the reasoning capabilities of Large Language Models is increasingly challenging as models ...
🇺🇸 Retrieval Augmented (Knowledge Graph), and Large Language Model-Driven Design Structure Matrix (DSM) Generation of Cyber-Physical Systems (2026-02-20)
arXiv:2602.16715v1 Announce Type: new Abstract: We explore the potential of Large Language Models (LLMs), Retrieval-Augmented Generation (RAG), and G...
🇺🇸 OpenAI’s latest product lets you vibe code science (2026-01-27)
OpenAI just revealed what its new in-house team, OpenAI for Science, has been up to. The firm has released a free LLM-powered tool for scientists call...
🇺🇸 Advancing science and math with GPT-5.2 (2025-12-11)
GPT-5.2 is OpenAI’s strongest model yet for math and science, setting new state-of-the-art results on benchmarks like GPQA Diamond and FrontierMath. T...
🇺🇸 GISA: A Benchmark for General Information-Seeking Assistant (2026-02-16)
arXiv:2602.08543v2 Announce Type: replace-cross Abstract: The advancement of large language models (LLMs) has significantly accelerated the developme...
🇺🇸 Provable Training Data Identification for Large Language Models (2026-02-16)
arXiv:2510.09717v2 Announce Type: replace-cross Abstract: Identifying training data of large-scale models is critical for copyright litigation, priva...
🇺🇸 Principled Synthetic Data Enables the First Scaling Laws for LLMs in Recommendation (2026-02-16)
arXiv:2602.07298v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) represent a promising frontier for recommender systems, yet th...
🇺🇸 Exploring AI-Augmented Sensemaking of Patient-Generated Health Data: A Mixed-Method Study with Healthcare Professionals in Cardiac Risk Reduction (2026-02-16)
arXiv:2602.05687v4 Announce Type: replace-cross Abstract: Individuals are increasingly generating substantial personal health and lifestyle data, e.g...
🇺🇸 Beyond Static Question Banks: Dynamic Knowledge Expansion via LLM-Automated Graph Construction and Adaptive Generation (2026-02-16)
arXiv:2602.00020v2 Announce Type: replace-cross Abstract: Personalized education systems increasingly rely on structured knowledge representations to...
🇺🇸 ATLAS : Adaptive Self-Evolutionary Research Agent with Task-Distributed Multi-LLM Supporters (2026-02-16)
arXiv:2602.02709v2 Announce Type: replace Abstract: Recent multi-LLM agent systems perform well in prompt optimization and automated problem-solving,...
🇺🇸 Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges (2026-02-16)
arXiv:2510.23883v2 Announce Type: replace Abstract: Agentic AI systems powered by large language models (LLMs) and endowed with planning, tool use, m...
🇺🇸 Know More, Know Clearer: A Meta-Cognitive Framework for Knowledge Augmentation in Large Language Models (2026-02-16)
arXiv:2602.12996v1 Announce Type: cross Abstract: Knowledge augmentation has significantly enhanced the performance of Large Language Models (LLMs) i...
🇺🇸 Knowledge-Based Design Requirements for Generative Social Robots in Higher Education (2026-02-16)
arXiv:2602.12873v1 Announce Type: cross Abstract: Generative social robots (GSRs) powered by large language models enable adaptive, conversational tu...
🇺🇸 CacheMind: From Miss Rates to Why -- Natural-Language, Trace-Grounded Reasoning for Cache Replacement (2026-02-16)
arXiv:2602.12422v1 Announce Type: cross Abstract: Cache replacement remains a challenging problem in CPU microarchitecture, often addressed using han...
🇺🇸 Reasoning about Intent for Ambiguous Requests (2026-02-16)
arXiv:2511.10453v2 Announce Type: replace-cross Abstract: Large language models often respond to ambiguous requests by implicitly committing to one i...
🇺🇸 Low-Resource Dialect Adaptation of Large Language Models: A French Dialect Case-Study (2026-02-16)
arXiv:2510.22747v2 Announce Type: replace-cross Abstract: Despite the widespread adoption of large language models (LLMs), their strongest capabiliti...
🇺🇸 Eliminating stability hallucinations in llm-based tts models via attention guidance (2026-02-16)
arXiv:2509.19852v2 Announce Type: replace-cross Abstract: This paper focuses on resolving stability hallucinations (e.g., repetitive or omitted speec...
🇺🇸 ToolACE-MT: Non-Autoregressive Generation for Agentic Multi-Turn Interaction (2026-02-16)
arXiv:2508.12685v3 Announce Type: replace-cross Abstract: Agentic task-solving with Large Language Models (LLMs) requires multi-turn, multi-step inte...
🇺🇸 R-Zero: Self-Evolving Reasoning LLM from Zero Data (2026-02-16)
arXiv:2508.05004v4 Announce Type: replace-cross Abstract: Self-evolving Large Language Models (LLMs) offer a scalable path toward super-intelligence ...
🇺🇸 PlanetServe: A Decentralized, Scalable, and Privacy-Preserving Overlay for Democratizing Large Language Model Serving (2026-02-16)
arXiv:2504.20101v5 Announce Type: replace-cross Abstract: While significant progress has been made in research and development on open-source and cos...
🇺🇸 LTSM-Bundle: A Toolbox and Benchmark on Large Language Models for Time Series Forecasting (2026-02-16)
arXiv:2406.14045v3 Announce Type: replace-cross Abstract: Time Series Forecasting (TSF) has long been a challenge in time series analysis. Inspired b...
🇺🇸 WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning (2026-02-16)
arXiv:2602.04634v2 Announce Type: replace Abstract: Recent advancements in Large Language Models (LLMs) have largely focused on depth scaling, where ...
🇺🇸 TRACE: Temporal Reasoning via Agentic Context Evolution for Streaming Electronic Health Records (EHRs) (2026-02-16)
arXiv:2602.12833v1 Announce Type: cross Abstract: Large Language Models (LLMs) encode extensive medical knowledge but struggle to apply it reliably t...
🇺🇸 Understanding Chain-of-Thought in Large Language Models via Topological Data Analysis (2026-02-16)
arXiv:2512.19135v2 Announce Type: replace Abstract: With the development of large language models (LLMs), particularly with the introduction of the l...
🇺🇸 RLIE: Rule Generation with Logistic Regression, Iterative Refinement, and Evaluation for Large Language Models (2026-02-16)
arXiv:2510.19698v2 Announce Type: replace Abstract: Large Language Models (LLMs) can propose rules in natural language, sidestepping the need for a p...
🇺🇸 Difficulty-Aware Agentic Orchestration for Query-Specific Multi-Agent Workflows (2026-02-16)
arXiv:2509.11079v5 Announce Type: replace Abstract: Large Language Model (LLM)-based agentic systems have shown strong capabilities across various ta...
🇺🇸 SCAN: Semantic Document Layout Analysis for Textual and Visual Retrieval-Augmented Generation (2026-02-16)
arXiv:2505.14381v3 Announce Type: replace Abstract: With the increasing adoption of Large Language Models (LLMs) and Vision-Language Models (VLMs), r...
🇺🇸 Asynchronous Verified Semantic Caching for Tiered LLM Architectures (2026-02-16)
arXiv:2602.13165v1 Announce Type: cross Abstract: Large language models (LLMs) now sit in the critical path of search, assistance, and agentic workfl...
🇺🇸 Buy versus Build an LLM: A Decision Framework for Governments (2026-02-16)
arXiv:2602.13033v1 Announce Type: cross Abstract: Large Language Models (LLMs) represent a new frontier of digital infrastructure that can support a ...
🇺🇸 Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward (2026-02-16)
arXiv:2602.12430v1 Announce Type: cross Abstract: The transition from monolithic language models to modular, skill-equipped agents marks a defining s...
🇺🇸 From Biased Chatbots to Biased Agents: Examining Role Assignment Effects on LLM Agent Robustness (2026-02-16)
arXiv:2602.12285v1 Announce Type: cross Abstract: Large Language Models (LLMs) are increasingly deployed as autonomous agents capable of actions with...
🇺🇸 TriGen: NPU Architecture for End-to-End Acceleration of Large Language Models based on SW-HW Co-Design (2026-02-16)
arXiv:2602.12962v1 Announce Type: cross Abstract: Recent studies have extensively explored NPU architectures for accelerating AI inference in on-devi...
🇺🇸 Left-right asymmetry in predicting brain activity from LLMs' representations emerges with their formal linguistic competence (2026-02-16)
arXiv:2602.12811v1 Announce Type: cross Abstract: When humans and large language models (LLMs) process the same text, activations in the LLMs correla...
🇺🇸 "Not Human, Funnier": How Machine Identity Shapes Humor Perception in Online AI Stand-up Comedy (2026-02-16)
arXiv:2602.12763v1 Announce Type: cross Abstract: Chatbots are increasingly applied to domains previously reserved for human actors. One such domain ...
🇺🇸 VI-CuRL: Stabilizing Verifier-Independent RL Reasoning via Confidence-Guided Variance Reduction (2026-02-16)
arXiv:2602.12579v1 Announce Type: cross Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a dominant paradigm for enhanc...
🇺🇸 SD-MoE: Spectral Decomposition for Effective Expert Specialization (2026-02-16)
arXiv:2602.12556v1 Announce Type: cross Abstract: Mixture-of-Experts (MoE) architectures scale Large Language Models via expert specialization induce...
🇺🇸 RankLLM: Weighted Ranking of LLMs by Quantifying Question Difficulty (2026-02-16)
arXiv:2602.12424v1 Announce Type: cross Abstract: Benchmarks establish a standardized evaluation framework to systematically assess the performance o...
🇺🇸 Soft Contamination Means Benchmarks Test Shallow Generalization (2026-02-16)
arXiv:2602.12413v1 Announce Type: cross Abstract: If LLM training data is polluted with benchmark test data, then benchmark performance gives biased ...
🇺🇸 OptiML: An End-to-End Framework for Program Synthesis and CUDA Kernel Optimization (2026-02-16)
arXiv:2602.12305v1 Announce Type: cross Abstract: Generating high-performance CUDA kernels remains challenging due to the need to navigate a combinat...
🇺🇸 To Mix or To Merge: Toward Multi-Domain Reinforcement Learning for Large Language Models (2026-02-16)
arXiv:2602.12566v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) plays a key role in stimulating the explicit re...
🇺🇸 Intent-Driven Smart Manufacturing Integrating Knowledge Graphs and Large Language Models (2026-02-16)
arXiv:2602.12419v1 Announce Type: new Abstract: The increasing complexity of smart manufacturing environments demands interfaces that can translate h...

🔗 Entity Intersection Graph

People and organizations frequently mentioned alongside Large language model:

Artificial intelligence · 20 shared articles
Reinforcement learning · 11 shared articles
🌐
Ethics of artificial intelligence · 11 shared articles
🌐
AI safety · 11 shared articles
🌐
AI alignment · 7 shared articles
🌐
Machine learning · 6 shared articles
🌐
AI agent · 5 shared articles
Educational technology · 4 shared articles
🌐
Theory of mind · 4 shared articles
🌐
Benchmark · 3 shared articles