⛔️ Jailbreaking vs Prompt Injection Attack

Abstract

This section compares "Jailbreaking" and "Prompt Injection" attacks.

The terms "jailbreaking" and "prompt injection" refer to different techniques associated with interacting with LLMs like GPT-4, Gemini etc.

📗 Comparison

Intent: Jailbreaking aims to bypass restrictions entirely while prompt injection aims to manipulate model's response to a specific prompt rather than seeking to disable safeguards wholesale.
Technique: Jailbreaking often requires a deeper understanding of the model's restrictions and safety mechanisms to effectively bypass them. Prompt injection, on the other hand, relies more on crafting the prompt itself in a way that deceives the model to generate a certain response.
Scope: Jailbreaking has a broader aim of unlocking restricted capabilities of the model, while prompt injection is more about manipulating the model's output on a case-by-case basis.

To summarize, prompt injection aims to manipulate the model's output for a specific format while jailbreaking aims to bypass the safety restrictions enforced.