SP
BravenNow
Obscure but Effective: Classical Chinese Jailbreak Prompt Optimization via Bio-Inspired Search
| USA | technology | ✓ Verified - arxiv.org

Obscure but Effective: Classical Chinese Jailbreak Prompt Optimization via Bio-Inspired Search

#Large Language Models #Jailbreak attacks #Classical Chinese #AI security #Bio-inspired search #Adversarial prompts #CC-BOS framework #Black-box attacks

📌 Key Takeaways

  • Researchers developed CC-BOS framework using classical Chinese to jailbreak LLMs
  • Classical Chinese's conciseness and obscurity allows bypassing safety constraints
  • The bio-inspired search approach optimizes adversarial prompts across eight dimensions
  • CC-BOS consistently outperformed existing jailbreak attack methods in experiments

📖 Full Retelling

A team of researchers led by Xun Huang and including 8 other authors published a groundbreaking paper on arXiv on February 26, 2026, revealing how classical Chinese can be used to bypass safety constraints in Large Language Models through jailbreak attacks. The researchers developed a framework called CC-BOS that utilizes bio-inspired search to automatically generate classical Chinese adversarial prompts, exposing significant vulnerabilities in AI systems that process natural language. The study addresses growing concerns about the security risks of increasingly prevalent Large Language Models, which have shown high susceptibility to jailbreak attacks with varying effectiveness across different language contexts. The researchers discovered that classical Chinese, with its inherent conciseness and obscurity, can partially circumvent existing safety measures designed to prevent harmful outputs, representing a significant advancement in understanding linguistic vulnerabilities of AI systems. The CC-BOS framework encodes prompts into eight policy dimensions—covering role, behavior, mechanism, metaphor, expression, knowledge, trigger pattern and context—and iteratively refines them through smell search, visual search, and cauchy mutation processes inspired by fruit fly optimization. This design enables efficient exploration of the search space, enhancing the effectiveness of black-box jailbreak attacks, while the researchers also developed a classical Chinese to English translation module to improve evaluation accuracy.

🏷️ Themes

AI Security, Linguistic Vulnerabilities, Technical Innovation

📚 Related People & Topics

Classical Chinese

Literary form of written Chinese

Classical Chinese is the style of Chinese language in which the classics of Chinese literature were written, from c. the 5th century BCE. For millennia thereafter, the syntax of written Chinese used in these works was imitated and iterated upon by scholars in a form now called Literary Chinese, whi...

View Profile → Wikipedia ↗

Large language model

Type of machine learning model

A large language model (LLM) is a language model trained with self-supervised machine learning on a vast amount of text, designed for natural language processing tasks, especially language generation. The largest and most capable LLMs are generative pre-trained transformers (GPTs) that provide the c...

View Profile → Wikipedia ↗

Entity Intersection Graph

No entity connections available yet for this article.

Original Source
--> Computer Science > Artificial Intelligence arXiv:2602.22983 [Submitted on 26 Feb 2026] Title: Obscure but Effective: Classical Chinese Jailbreak Prompt Optimization via Bio-Inspired Search Authors: Xun Huang , Simeng Qin , Xiaoshuang Jia , Ranjie Duan , Huanqian Yan , Zhitao Zeng , Fei Yang , Yang Liu , Xiaojun Jia View a PDF of the paper titled Obscure but Effective: Classical Chinese Jailbreak Prompt Optimization via Bio-Inspired Search, by Xun Huang and 8 other authors View PDF HTML Abstract: As Large Language Models are increasingly used, their security risks have drawn increasing attention. Existing research reveals that LLMs are highly susceptible to jailbreak attacks, with effectiveness varying across language contexts. This paper investigates the role of classical Chinese in jailbreak attacks. Owing to its conciseness and obscurity, classical Chinese can partially bypass existing safety constraints, exposing notable vulnerabilities in LLMs. Based on this observation, this paper proposes a framework, CC-BOS, for the automatic generation of classical Chinese adversarial prompts based on multi-dimensional fruit fly optimization, facilitating efficient and automated jailbreak attacks in black-box settings. Prompts are encoded into eight policy dimensions-covering role, behavior, mechanism, metaphor, expression, knowledge, trigger pattern and context; and iteratively refined via smell search, visual search, and cauchy mutation. This design enables efficient exploration of the search space, thereby enhancing the effectiveness of black-box jailbreak attacks. To enhance readability and evaluation accuracy, we further design a classical Chinese to English translation module. Extensive experiments demonstrate that effectiveness of the proposed CC-BOS, consistently outperforming state-of-the-art jailbreak attack methods. Subjects: Artificial Intelligence (cs.AI) ; Cryptography and Security (cs.CR) Cite as: arXiv:2602.22983 [cs.AI] (or arXiv:2602.22983v1 [cs.AI] for...
Read full article at source

Source

arxiv.org

More from USA

News from Other Countries

🇬🇧 United Kingdom

🇺🇦 Ukraine