Obscure but Effective: Classical Chinese Jailbreak Prompt Optimization via Bio-Inspired Search
#Large Language Models #Jailbreak attacks #Classical Chinese #AI security #Bio-inspired search #Adversarial prompts #CC-BOS framework #Black-box attacks
📌 Key Takeaways
- Researchers developed CC-BOS framework using classical Chinese to jailbreak LLMs
- Classical Chinese's conciseness and obscurity allows bypassing safety constraints
- The bio-inspired search approach optimizes adversarial prompts across eight dimensions
- CC-BOS consistently outperformed existing jailbreak attack methods in experiments
📖 Full Retelling
🏷️ Themes
AI Security, Linguistic Vulnerabilities, Technical Innovation
📚 Related People & Topics
Classical Chinese
Literary form of written Chinese
Classical Chinese is the style of Chinese language in which the classics of Chinese literature were written, from c. the 5th century BCE. For millennia thereafter, the syntax of written Chinese used in these works was imitated and iterated upon by scholars in a form now called Literary Chinese, whi...
Large language model
Type of machine learning model
A large language model (LLM) is a language model trained with self-supervised machine learning on a vast amount of text, designed for natural language processing tasks, especially language generation. The largest and most capable LLMs are generative pre-trained transformers (GPTs) that provide the c...
Entity Intersection Graph
No entity connections available yet for this article.