Conv-FinRe: A Conversational and Longitudinal Benchmark for Utility-Grounded Financial Recommendation
#Conv‑FinRe #Conversational Benchmark #Utility‑Grounded #Stock Recommendation #Large Language Models #Behavior Imitation #Longitudinal Evaluation #Investor Risk Preferences #Market Volatility #Hugging Face Dataset #GitHub Codebase
📌 Key Takeaways
- Conv‑FinRe is a new benchmark for financial recommendation that goes beyond behavior matching.
- It uses conversational and longitudinal data: onboarding interviews, market context updates, and advisory dialogues.
- Evaluation requires LLMs to rank stocks over a fixed investment horizon, distinguishing descriptive behavior from normative, risk‑based utility.
- Results show a tension between rational decision quality and behavioral alignment; models that rank well on utility often fail to match user choices.
- The dataset is built from real market data and human decision trajectories, and is publicly available on Hugging Face with code on GitHub.
📖 Full Retelling
🏷️ Themes
Artificial Intelligence, Financial Recommendation, Large Language Models, Conversational AI, Longitudinal Benchmarking, Behavioral Finance, Recommendation Systems, Utility‑Based Evaluation
Entity Intersection Graph
No entity connections available yet for this article.
Deep Analysis
Why It Matters
Conv-FinRe introduces a benchmark that evaluates financial recommendation models based on utility and risk preferences rather than just mimicking noisy user behavior, which is crucial for reliable investment advice in volatile markets.
Context & Background
- Existing benchmarks focus on behavior imitation, ignoring long-term financial goals
- Conv-FinRe incorporates real market data and human decision trajectories
- It provides multi-view references distinguishing descriptive behavior from normative utility
What Happens Next
The dataset and code will be available on Hugging Face and GitHub, encouraging researchers to develop models that balance rational decision quality with behavioral alignment. Future work may extend the benchmark to other asset classes and integrate regulatory compliance checks.
Frequently Asked Questions
It evaluates models on utility-based rankings over a fixed horizon and separates descriptive behavior from normative utility, allowing diagnosis of rationality versus noise.
The dataset is publicly released on Hugging Face and the codebase is on GitHub, both linked in the paper.
The current version focuses on stock recommendations, but the framework can be adapted to other assets in future releases.