Back to Glossary

What is Jailbreak?

Ethics & Safety Glossary term: Jailbreak
Short Answer

Attempts to bypass AI safety measures and content filters.

Jailbreaking refers to attempts to manipulate AI systems to bypass their built-in safety measures, content filters, and ethical guidelines. This can lead to harmful outputs and misuse of AI systems.

Common jailbreak techniques include:

  • Role manipulation: Attempting to change the AI's intended role
  • Context poisoning: Providing misleading context or instructions
  • Instruction override: Trying to bypass system instructions
  • Format manipulation: Using unusual formatting to confuse filters
  • Indirect requests: Making harmful requests in indirect ways

Best Practices

  • Implement robust system prompts
  • Use multiple layers of safety measures
  • Monitor for suspicious input patterns
  • Regular security testing and updates
  • Implement human oversight for sensitive tasks

🎯 Use Cases

  • AI safety research
  • Security testing
  • Vulnerability assessment
  • Defense development
  • Ethical AI development