Many-Shot Jailbreaking
Focus: Methods or Design
Source: Anthropic
Readability: Expert
Type: Website Article
Open Source: Yes
Keywords: N/A
Learn Tags: AI and Machine Learning Ethics Design/Methods
Summary: Anthropic researchers have identified a new jailbreaking technique called "many-shot jailbreaking," which can be used to evade LLM safety guardrails. By taking advantage of the increased context window and including large amounts of scripted text, LLMs can be forced to produce potentially harmful responses that go against their training, such as explaining how to build a bomb.