2/25/2026 | USA | technology | ✓ Verified - arxiv.org

Enjoying Non-linearity in Multinomial Logistic Bandits: A Minimax-Optimal Algorithm

#Multinomial logistic bandits #Minimax-optimal #Non-linearity #Regret bounds #Reinforcement learning #Recommender systems #Machine learning theory

📌 Key Takeaways

Researchers developed a minimax-optimal algorithm for multinomial logistic bandits
The algorithm extends binary setting research to applications with multiple choices
It achieves improved regret bounds of order ÊRd√(T/κ*)
The researchers provided a matching lower-bound proving optimality

📖 Full Retelling

Pierre Boudart, Pierre Gaillard, and Alessandro Rudi from PSL, DI-ENS, and Inria research institutions developed a minimax-optimal algorithm for multinomial logistic bandits, published on arXiv on February 24, 2026, extending previous binary setting research to applications with multiple choices while improving regret bounds. The paper addresses the multinomial logistic bandit problem where a learner selects actions to maximize expected rewards based on probabilistic feedback from multiple possible outcomes. Previous work in the binary setting had identified a problem-dependent constant κ* ≥ 1 that captures the non-linearity of the logistic model, allowing for improved regret guarantees. The researchers extended this analysis to the multinomial setting with a finite action space, making it applicable to complex systems like reinforcement learning and recommender systems that involve more than two choices. Their approach extends the definition of κ* to the multinomial setting and proposes an efficient algorithm that leverages the problem's non-linearity, achieving a problem-dependent regret bound of order ÊRd√(T/κ*), where R is the norm of the vector of rewards and K is the number of outcomes.

🏷️ Themes

Machine Learning, Optimization Algorithms, Decision Theory

Entity Intersection Graph

No entity connections available yet for this article.

}

Original Source

              -- Machine Learning arXiv:2507.05306 [Submitted on 7 Jul 2025 ( v1 ), last revised 24 Feb 2026 (this version, v3)] Title: Enjoying Non-linearity in Multinomial Logistic Bandits: A Minimax-Optimal Algorithm Authors: Pierre Boudart , Pierre Gaillard , Alessandro Rudi (PSL, DI-ENS, Inria) View a PDF of the paper titled Enjoying Non-linearity in Multinomial Logistic Bandits: A Minimax-Optimal Algorithm, by Pierre Boudart and 4 other authors View PDF Abstract: We consider the multinomial logistic bandit problem in which a learner interacts with an environment by selecting actions to maximize expected rewards based on probabilistic feedback from multiple possible outcomes. In the binary setting, recent work has focused on understanding the impact of the non-linearity of the logistic model (Faury et al., 2020; Abeille et al., 2021). They introduced a problem-dependent constant $\kappa_* \geq 1$ that may be exponentially large in some problem parameters and which is captured by the derivative of the sigmoid function. It encapsulates the non-linearity and improves existing regret guarantees over $T$ rounds from $\smash d\sqrt $ to $\smash d\sqrt{T/\kappa_*})}$, where $d$ is the dimension of the parameter space. We extend their analysis to the multinomial logistic bandit framework with a finite action space, making it suitable for complex applications with more than two choices, such as reinforcement learning or recommender systems. To achieve this, we extend the definition of $ \kappa_* $ to the multinomial setting and propose an efficient algorithm that leverages the problem's non-linearity. Our method yields a problem-dependent regret bound of order $ \smash{\widetilde{\mathcal R d \sqrt{ /{\kappa_*}} ) } $, where $R$ denotes the norm of the vector of rewards and $K$ is the number of outcomes. This improves upon the best existing guarantees of order $ \smash{\widetilde{\mathcal RdK \sqrt )}$. Moreover, we provide a matching $\smash{ \Omega(dR\sqrt{KT/\kappa_*})}$ lower-bou...
            

Read full article at source

Source

arxiv.org

Enjoying Non-linearity in Multinomial Logistic Bandits: A Minimax-Optimal Algorithm

📌 Key Takeaways

📖 Full Retelling

🏷️ Themes

Entity Intersection Graph

Source

More from USA

News from Other Countries

🇬🇧 United Kingdom

🇺🇦 Ukraine