#Large Language Models
Latest news articles tagged with "Large Language Models". Follow the timeline of events, related topics, and entities.
Articles (30)
-
🇺🇸 Duel-Evolve: Reward-Free Test-Time Scaling via LLM Self-Preferences
[USA]
arXiv:2602.21585v1 Announce Type: cross Abstract: Many applications seek to optimize LLM outputs at test time by iteratively proposing, scoring, and refining candidates over a discrete output space. ...
Related: #Machine Learning Optimization, #Reward-Free AI Systems -
🇺🇸 ICON: Indirect Prompt Injection Defense for Agents based on Inference-Time Correction
[USA]
arXiv:2602.20708v1 Announce Type: new Abstract: Large Language Model (LLM) agents are susceptible to Indirect Prompt Injection (IPI) attacks, where malicious instructions in retrieved content hijack ...
Related: #AI Security, #Prompt Injection Defense, #Cybersecurity Research -
🇺🇸 A Problem-Oriented Perspective and Anchor Verification for Code Optimization
[USA]
arXiv:2406.11935v3 Announce Type: replace-cross Abstract: Large Language Models (LLMs) have shown remarkable capabilities in solving various programming tasks, such as code generation. However, their...
Related: #Code Optimization, #Software Engineering -
🇺🇸 CHESS: Context-aware Hierarchical Efficient Semantic Selection for Long-Context LLM Inference
[USA]
arXiv:2602.20732v1 Announce Type: new Abstract: Long-context LLMs demand accurate inference at low latency, yet decoding becomes primarily constrained by KV cache as context grows. Prior pruning meth...
Related: #AI Optimization, #Computational Efficiency -
🇺🇸 Golden Layers and Where to Find Them: Improved Knowledge Editing for Large Language Models Via Layer Gradient Analysis
[USA]
arXiv:2602.20207v1 Announce Type: cross Abstract: Knowledge editing in Large Language Models (LLMs) aims to update the model's prediction for a specific query to a desired target while preserving its...
Related: #Knowledge Editing, #Machine Learning Research -
🇺🇸 Mobility-Aware Cache Framework for Scalable LLM-Based Human Mobility Simulation
[USA]
arXiv:2602.16727v1 Announce Type: new Abstract: Large-scale human mobility simulation is critical for applications such as urban planning, epidemiology, and transportation analysis. Recent works trea...
Related: #Artificial Intelligence, #Machine Learning, #Human Mobility Simulation, #Scalable Computing -
🇺🇸 KLong: Training LLM Agent for Extremely Long-horizon Tasks
[USA]
arXiv:2602.17547v1 Announce Type: new Abstract: This paper introduces KLong, an open-source LLM agent trained to solve extremely long-horizon tasks. The principle is to first cold-start the model via...
Related: #Artificial Intelligence, #Self‑supervised Learning, #Reinforcement Learning, #Long‑Horizon Task Planning -
🇺🇸 Wink: Recovering from Misbehaviors in Coding Agents
[USA]
arXiv:2602.17037v1 Announce Type: cross Abstract: Autonomous coding agents, powered by large language models (LLMs), are increasingly being adopted in the software industry to automate complex engine...
Related: #Artificial Intelligence, #Software Engineering, #Human‑Computer Interaction, #Autonomous Coding Agents -
🇺🇸 MALLVI: a multi agent framework for integrated generalized robotics manipulation
[USA]
arXiv:2602.16898v1 Announce Type: cross Abstract: Task planning for robotic manipulation with large language models (LLMs) is an emerging area. Prior approaches rely on specialized models, fine tunin...
Related: #Robotics Manipulation, #Multi‑Agent Systems, #Closed‑Loop Control, #Perception and Vision‑Language Integration -
🇺🇸 The Cascade Equivalence Hypothesis: When Do Speech LLMs Behave Like ASR$\rightarrow$LLM Pipelines?
[USA]
arXiv:2602.17598v1 Announce Type: cross Abstract: Current speech LLMs largely perform implicit ASR: on tasks solvable from a transcript, they are behaviorally and mechanistically equivalent to simple...
Related: #Speech Recognition, #Model Architecture Comparison, #Audio Processing, #Efficiency and Cost Analysis -
🇺🇸 FAMOSE: A ReAct Approach to Automated Feature Discovery
[USA]
arXiv:2602.17641v1 Announce Type: cross Abstract: Feature engineering remains a critical yet challenging bottleneck in machine learning, particularly for tabular data, as identifying optimal features...
Related: #Machine Learning, #Feature Engineering, #AI Agents, #ReAct Paradigm -
🇺🇸 Goal Inference from Open-Ended Dialog
[USA]
arXiv:2410.13957v2 Announce Type: replace Abstract: Embodied AI Agents are quickly becoming important and common tools in society. These embodied agents should be able to learn about and accomplish a...
Related: #Embodied AI, #Goal inference, #Bayesian inference, #Online learning -
🇺🇸 Capturing Individual Human Preferences with Reward Features
[USA]
arXiv:2503.17338v2 Announce Type: replace Abstract: Reinforcement learning from human feedback usually models preferences using a reward function that does not distinguish between people. We argue th...
Related: #Artificial Intelligence, #Reinforcement Learning from Human Feedback, #Personalization, #Reward Modeling -
🇺🇸 A Scalable Framework for Evaluating Health Language Models
[USA]
arXiv:2503.23339v3 Announce Type: replace Abstract: Large language models (LLMs) have emerged as powerful tools for analyzing complex datasets. Recent studies demonstrate their potential to generate ...
Related: #Health Informatics, #Evaluation Methodology, #Human‑Computer Interaction, #Scalability in AI -
🇺🇸 Autonomous Business System via Neuro-symbolic AI
[USA]
arXiv:2601.15599v2 Announce Type: replace Abstract: Current business environments demand continuous reconfiguration of cross-functional processes, yet enterprise systems remain organized around siloe...
Related: #Neuro‑symbolic AI, #Business Process Automation, #Enterprise Knowledge Graphs, #Predicate Logic Programming -
🇺🇸 OpenSage: Self-programming Agent Generation Engine
[USA]
arXiv:2602.16891v1 Announce Type: new Abstract: Agent development kits (ADKs) provide effective platforms and tooling for constructing agents, and their designs are critical to the constructed agents...
Related: #Artificial Intelligence, #Agent Development, #Automated Tool Generation, #Hierarchical Memory Systems -
🇺🇸 Conv-FinRe: A Conversational and Longitudinal Benchmark for Utility-Grounded Financial Recommendation
[USA]
arXiv:2602.16990v1 Announce Type: new Abstract: Most recommendation benchmarks evaluate how well a model imitates user behavior. In financial advisory, however, observed actions can be noisy or short...
Related: #Artificial Intelligence, #Financial Recommendation, #Conversational AI, #Longitudinal Benchmarking -
🇺🇸 Agentic Wireless Communication for 6G: Intent-Aware and Continuously Evolving Physical-Layer Intelligence
[USA]
arXiv:2602.17096v1 Announce Type: new Abstract: As 6G wireless systems evolve, growing functional complexity and diverse service demands are driving a shift from rule-based control to intent-driven a...
Related: #Artificial Intelligence, #6G Wireless Communications, #Intent‑Aware Networking, #Autonomous Decision Making -
🇺🇸 Decoding the Human Factor: High Fidelity Behavioral Prediction for Strategic Foresight
[USA]
arXiv:2602.17222v1 Announce Type: new Abstract: Predicting human decision-making in high-stakes environments remains a central challenge for artificial intelligence. While large language models (LLMs...
Related: #Artificial Intelligence, #Behavioral Modeling, #Predictive Analytics, #Psychometrics -
🇺🇸 Mechanistic Interpretability of Cognitive Complexity in LLMs via Linear Probing using Bloom's Taxonomy
[USA]
arXiv:2602.17229v1 Announce Type: new Abstract: The black-box nature of Large Language Models necessitates novel evaluation frameworks that transcend surface-level performance metrics. This study inv...
Related: #Mechanistic interpretability, #Bloom’s Taxonomy, #Linear probing, #Cognitive complexity -
🇺🇸 Retrieval Collapses When AI Pollutes the Web
[USA]
arXiv:2602.16136v1 Announce Type: cross Abstract: The rapid proliferation of AI-generated content on the Web presents a structural risk to information retrieval, as search engines and Retrieval-Augme...
Related: #Information Retrieval, #AI‑Generated Content, #Search Engine Reliability, #Ecosystem‑Level Failure Modes -
🇺🇸 HiPER: Hierarchical Reinforcement Learning with Explicit Credit Assignment for Large Language Model Agents
[USA]
arXiv:2602.16165v1 Announce Type: cross Abstract: Training LLMs as interactive agents for multi-turn decision-making remains challenging, particularly in long-horizon tasks with sparse and delayed re...
Related: #Reinforcement Learning, #Hierarchical Control, #Credit Assignment, #Long‑Horizon Decision Making -
🇺🇸 Are LLMs Ready to Replace Bangla Annotators?
[USA]
arXiv:2602.16241v1 Announce Type: cross Abstract: Large Language Models (LLMs) are increasingly used as automated annotators to scale dataset creation, yet their reliability as unbiased annotators--e...
Related: #Automated Annotation, #Bias and Fairness, #Low‑Resource Languages, #Hate‑Speech Detection -
🇺🇸 TimeOmni-1: Incentivizing Complex Reasoning with Time Series in Large Language Models
[USA]
arXiv:2509.24803v2 Announce Type: replace Abstract: Recent advances in multimodal time series learning underscore a paradigm shift from analytics centered on basic patterns toward advanced time serie...
Related: #Multimodal Time‑Series Learning, #Advanced Temporal Reasoning, #Dataset Design for AI, #Analytics Paradigm Shift -
🇺🇸 CAST: Achieving Stable LLM-based Text Analysis for Data Analytics
[USA]
arXiv:2602.15861v1 Announce Type: cross Abstract: Text analysis of tabular data relies on two core operations: \emph{summarization} for corpus-level theme extraction and \emph{tagging} for row-level ...
Related: #Data Analytics, #Output Stability, #Algorithmic Prompting, #Tabular Data Analysis -
🇺🇸 Playing With AI: How Do State-Of-The-Art Large Language Models Perform in the 1977 Text-Based Adventure Game Zork?
[USA]
arXiv:2602.15867v1 Announce Type: cross Abstract: In this positioning paper, we evaluate the problem-solving and reasoning capabilities of contemporary Large Language Models (LLMs) through their perf...
Related: #Game‑Based Evaluation, #Natural Language Understanding, #Problem‑Solving & Reasoning -
🇺🇸 Can Generative Artificial Intelligence Survive Data Contamination? Theoretical Guarantees under Contaminated Recursive Training
[USA]
arXiv:2602.16065v1 Announce Type: cross Abstract: Generative Artificial Intelligence (AI), such as large language models (LLMs), has become a transformative force across science, industry, and societ...
Related: #Artificial Intelligence, #Data Quality & Contamination, #Theoretical Machine Learning, #Web Content Authenticity -
🇺🇸 SPARC: Scenario Planning and Reasoning for Automated C Unit Test Generation
[USA]
arXiv:2602.16671v1 Announce Type: cross Abstract: Automated unit test generation for C remains a formidable challenge due to the semantic gap between high-level program intent and the rigid syntactic...
Related: #Automated software testing, #Program synthesis, #C programming language, #Scenario planning -
🇺🇸 Targeting Alignment: Extracting Safety Classifiers of Aligned LLMs
[USA]
arXiv:2501.16534v5 Announce Type: replace-cross Abstract: Alignment in large language models (LLMs) is used to enforce guidelines such as safety. Yet, alignment fails in the face of jailbreak attacks...
Related: #Model Alignment, #Safety Classifiers, #Jailbreak Attacks, #LLM Security -
🇺🇸 m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning with Large Language Models
[USA]
arXiv:2504.00869v2 Announce Type: replace-cross Abstract: Test-time scaling has emerged as a powerful technique for enhancing the reasoning capabilities of large language models. However, its effecti...
Related: #Medical AI, #Test‑time Scaling, #Reasoning, #Clinical Decision Support