2/20/2026 | USA | technology | ✓ Verified - arxiv.org

Conv-FinRe: A Conversational and Longitudinal Benchmark for Utility-Grounded Financial Recommendation

#Conv‑FinRe #Conversational Benchmark #Utility‑Grounded #Stock Recommendation #Large Language Models #Behavior Imitation #Longitudinal Evaluation #Investor Risk Preferences #Market Volatility #Hugging Face Dataset #GitHub Codebase

📌 Key Takeaways

Conv‑FinRe is a new benchmark for financial recommendation that goes beyond behavior matching.
It uses conversational and longitudinal data: onboarding interviews, market context updates, and advisory dialogues.
Evaluation requires LLMs to rank stocks over a fixed investment horizon, distinguishing descriptive behavior from normative, risk‑based utility.
Results show a tension between rational decision quality and behavioral alignment; models that rank well on utility often fail to match user choices.
The dataset is built from real market data and human decision trajectories, and is publicly available on Hugging Face with code on GitHub.

📖 Full Retelling

On 19 February 2026, a team of researchers led by Yan Wang published a paper on arXiv that introduces Conv‑FinRe, a conversational and longitudinal benchmark designed to assess large language models for stock recommendation using utility‑grounded evaluation rather than simple behavior imitation. The benchmark incorporates onboarding interviews, step‑wise market context, and advisory dialogues, and requires models to generate rankings over a fixed investment horizon. It distinguishes descriptive behavior from normative utility based on investor risk preferences, thereby diagnosing whether an LLM follows rational analysis or merely mimics noisy user choices. The dataset and corresponding code are publicly released on Hugging Face and GitHub.

🏷️ Themes

Artificial Intelligence, Financial Recommendation, Large Language Models, Conversational AI, Longitudinal Benchmarking, Behavioral Finance, Recommendation Systems, Utility‑Based Evaluation

Entity Intersection Graph

No entity connections available yet for this article.

Deep Analysis

Why It Matters

Conv-FinRe introduces a benchmark that evaluates financial recommendation models based on utility and risk preferences rather than just mimicking noisy user behavior, which is crucial for reliable investment advice in volatile markets.

Context & Background

Existing benchmarks focus on behavior imitation, ignoring long-term financial goals
Conv-FinRe incorporates real market data and human decision trajectories
It provides multi-view references distinguishing descriptive behavior from normative utility

What Happens Next

The dataset and code will be available on Hugging Face and GitHub, encouraging researchers to develop models that balance rational decision quality with behavioral alignment. Future work may extend the benchmark to other asset classes and integrate regulatory compliance checks.

Frequently Asked Questions

What makes Conv-FinRe different from other recommendation benchmarks?

It evaluates models on utility-based rankings over a fixed horizon and separates descriptive behavior from normative utility, allowing diagnosis of rationality versus noise.

How can I access the Conv-FinRe dataset?

The dataset is publicly released on Hugging Face and the codebase is on GitHub, both linked in the paper.

Will Conv-FinRe support other financial instruments?

The current version focuses on stock recommendations, but the framework can be adapted to other assets in future releases.

}

Original Source

              --> Computer Science > Artificial Intelligence arXiv:2602.16990 [Submitted on 19 Feb 2026] Title: Conv-FinRe: A Conversational and Longitudinal Benchmark for Utility-Grounded Financial Recommendation Authors: Yan Wang , Yi Han , Lingfei Qian , Yueru He , Xueqing Peng , Dongji Feng , Zhuohan Xie , Vincent Jim Zhang , Rosie Guo , Fengran Mo , Jimin Huang , Yankai Chen , Xue Liu , Jian-Yun Nie View a PDF of the paper titled Conv-FinRe: A Conversational and Longitudinal Benchmark for Utility-Grounded Financial Recommendation, by Yan Wang and 13 other authors View PDF HTML Abstract: Most recommendation benchmarks evaluate how well a model imitates user behavior. In financial advisory, however, observed actions can be noisy or short-sighted under market volatility and may conflict with a user's long-term goals. Treating what users chose as the sole ground truth, therefore, conflates behavioral imitation with decision quality. We introduce Conv-FinRe, a conversational and longitudinal benchmark for stock recommendation that evaluates LLMs beyond behavior matching. Given an onboarding interview, step-wise market context, and advisory dialogues, models must generate rankings over a fixed investment horizon. Crucially, Conv-FinRe provides multi-view references that distinguish descriptive behavior from normative utility grounded in investor-specific risk preferences, enabling diagnosis of whether an LLM follows rational analysis, mimics user noise, or is driven by market momentum. We build the benchmark from real market data and human decision trajectories, instantiate controlled advisory conversations, and evaluate a suite of state-of-the-art LLMs. Results reveal a persistent tension between rational decision quality and behavioral alignment: models that perform well on utility-based ranking often fail to match user choices, whereas behaviorally aligned models can overfit short-term noise. The dataset is publicly released on Hugging Face, and the codebase is available on Git...
            

Read full article at source

Source

arxiv.org