SP
BravenNow
A Decision-Theoretic Formalisation of Steganography With Applications to LLM Monitoring
| USA | technology | ✓ Verified - arxiv.org

A Decision-Theoretic Formalisation of Steganography With Applications to LLM Monitoring

#Steganography #Large Language Models #AI Safety #Decision Theory #Information Asymmetry #Model Monitoring #AI Alignment #Usman Anwar

📌 Key Takeaways

  • Researchers developed a decision-theoretic framework to detect steganography in large language models
  • Classical steganography detection methods are inapplicable to LLMs due to lack of reference distributions
  • The approach introduces generalized ℋ-information and the 'steganographic gap' as new metrics
  • The formalism can detect, quantify, and mitigate steganographic reasoning in AI systems

📖 Full Retelling

Researchers led by Usman Anwar and eight collaborators from various institutions have developed a decision-theoretic formalization of steganography with applications to monitoring large language models, as detailed in their paper submitted to arXiv on February 26, 2026. The research addresses growing concerns that LLMs are beginning to exhibit steganographic capabilities that could allow misaligned models to evade oversight mechanisms, highlighting a critical gap in current AI safety monitoring approaches. Classical definitions of steganography and detection methods require a known reference distribution of non-steganographic signals, which is not feasible for steganographic reasoning in LLMs, rendering existing approaches ineffective. The team proposes an alternative perspective based on the insight that steganography creates an asymmetry in usable information between agents who can and cannot decode hidden content within a steganographic signal, a difference that can be inferred from observable actions. To formalize this approach, the researchers introduce generalized ℋ-information—a utilitarian framework for measuring usable information within inputs—and define the 'steganographic gap' as a quantifiable measure that compares downstream utility between agents with and without decoding capabilities. Through empirical validation, the team demonstrates their formalism can effectively detect, quantify, and mitigate steganographic reasoning in LLMs, offering a significant advancement in AI safety monitoring techniques.

🏷️ Themes

AI Safety, Steganography, Information Theory, Machine Learning

📚 Related People & Topics

Steganography

Hiding messages in other messages

Steganography ( STEG-ə-NOG-rə-fee) is the practice of representing information within another message or physical object, in such a manner that the presence of the concealed information would not be evident to an unsuspecting person's examination. In computing/electronic contexts, a computer file, ...

View Profile → Wikipedia ↗
Decision theory

Decision theory

Branch of applied probability theory

Decision theory or the theory of rational choice is a branch of probability, economics, and analytic philosophy that uses expected utility and probability to model how individuals would behave rationally under uncertainty. It differs from the cognitive and behavioral sciences in that it is mainly pr...

View Profile → Wikipedia ↗

Large language model

Type of machine learning model

A large language model (LLM) is a language model trained with self-supervised machine learning on a vast amount of text, designed for natural language processing tasks, especially language generation. The largest and most capable LLMs are generative pre-trained transformers (GPTs) that provide the c...

View Profile → Wikipedia ↗
Information asymmetry

Information asymmetry

Concept in contract theory and economics

In contract theory, mechanism design, and economics, an information asymmetry is a situation where one party has more or better information than the other. Information asymmetry creates an imbalance of power in transactions, which can sometimes cause the transactions to be inefficient, causing marke...

View Profile → Wikipedia ↗

Entity Intersection Graph

No entity connections available yet for this article.

Original Source
--> Computer Science > Artificial Intelligence arXiv:2602.23163 [Submitted on 26 Feb 2026] Title: A Decision-Theoretic Formalisation of Steganography With Applications to LLM Monitoring Authors: Usman Anwar , Julianna Piskorz , David D. Baek , David Africa , Jim Weatherall , Max Tegmark , Christian Schroeder de Witt , Mihaela van der Schaar , David Krueger View a PDF of the paper titled A Decision-Theoretic Formalisation of Steganography With Applications to LLM Monitoring, by Usman Anwar and 8 other authors View PDF HTML Abstract: Large language models are beginning to show steganographic capabilities. Such capabilities could allow misaligned models to evade oversight mechanisms. Yet principled methods to detect and quantify such behaviours are lacking. Classical definitions of steganography, and detection methods based on them, require a known reference distribution of non-steganographic signals. For the case of steganographic reasoning in LLMs, knowing such a reference distribution is not feasible; this renders these approaches inapplicable. We propose an alternative, \textbf{decision-theoretic view of steganography}. Our central insight is that steganography creates an asymmetry in usable information between agents who can and cannot decode the hidden content (present within a steganographic signal), and this otherwise latent asymmetry can be inferred from the agents' observable actions. To formalise this perspective, we introduce generalised $\mathcal $-information: a utilitarian framework for measuring the amount of usable information within some input. We use this to define the \textbf{steganographic gap} -- a measure that quantifies steganography by comparing the downstream utility of the steganographic signal to agents that can and cannot decode the hidden content. We empirically validate our formalism, and show that it can be used to detect, quantify, and mitigate steganographic reasoning in LLMs. Comments: First two authors contributed equally Subjects: A...
Read full article at source

Source

arxiv.org

More from USA

News from Other Countries

🇬🇧 United Kingdom

🇺🇦 Ukraine