Many-Shot Jailbreaking

Focus: Methods or Design

Source: Anthropic

Readability: Expert

Type: Website Article

Open Source: Yes

External URL: https://www.anthropic.com/research/many-shot-jailbreaking

Keywords: N/A

Learn Tags: AI and Machine Learning Ethics Design/Methods

Summary: Anthropic researchers have identified a new jailbreaking technique called "many-shot jailbreaking," which can be used to evade LLM safety guardrails. By taking advantage of the increased context window and including large amounts of scripted text, LLMs can be forced to produce potentially harmful responses that go against their training, such as explaining how to build a bomb.