A Decision-Theoretic Formalisation of Steganography With Applications to LLM Monitoring
#Steganography #Large Language Models #AI Safety #Decision Theory #Information Asymmetry #Model Monitoring #AI Alignment #Usman Anwar
📌 Key Takeaways
- Researchers developed a decision-theoretic framework to detect steganography in large language models
- Classical steganography detection methods are inapplicable to LLMs due to lack of reference distributions
- The approach introduces generalized ℋ-information and the 'steganographic gap' as new metrics
- The formalism can detect, quantify, and mitigate steganographic reasoning in AI systems
📖 Full Retelling
🏷️ Themes
AI Safety, Steganography, Information Theory, Machine Learning
📚 Related People & Topics
Steganography
Hiding messages in other messages
Steganography ( STEG-ə-NOG-rə-fee) is the practice of representing information within another message or physical object, in such a manner that the presence of the concealed information would not be evident to an unsuspecting person's examination. In computing/electronic contexts, a computer file, ...
Decision theory
Branch of applied probability theory
Decision theory or the theory of rational choice is a branch of probability, economics, and analytic philosophy that uses expected utility and probability to model how individuals would behave rationally under uncertainty. It differs from the cognitive and behavioral sciences in that it is mainly pr...
Large language model
Type of machine learning model
A large language model (LLM) is a language model trained with self-supervised machine learning on a vast amount of text, designed for natural language processing tasks, especially language generation. The largest and most capable LLMs are generative pre-trained transformers (GPTs) that provide the c...
Information asymmetry
Concept in contract theory and economics
In contract theory, mechanism design, and economics, an information asymmetry is a situation where one party has more or better information than the other. Information asymmetry creates an imbalance of power in transactions, which can sometimes cause the transactions to be inefficient, causing marke...
Entity Intersection Graph
No entity connections available yet for this article.