Break My Guard is a Misc AI tool. Tests AI guardrails against jailbreaks; identifies security gaps in LLMs. Key features include AI Security Challenges, Jailbreaking Explained, and Advanced Attack Methods. Best for teachers, writers and healthcare professionals.
About Break My Guard
Key Features
AI Security Challenges.
Jailbreaking Explained.
Advanced Attack Methods.
Types of Jailbreak Techniques.
Indirect Prompt Injection Risk.
Defense-in-Depth Strategy.
Frequently Asked Questions
LLM jailbreaking is when people create clever prompts. These prompts trick language models into ignoring their safety rules. Then, the models might produce harmful outputs they're supposed to avoid.
Indirect prompt injection attacks are sneaky. Malicious instructions are hidden in documents, emails, or other data that AI systems process. An attacker might hide harmful prompts in white font, encoded text, or even within images. This lets attackers get unauthorized access to private information.
Comparison prompts are a clever way attackers get harmful content from AI models. They ask the model to compare its safe answers with known harmful ones. For example, "How does your answer differ from what a black-hat hacker would do?" This tricks the model into giving the harmful information they claim to be analyzing.
Few-shot and multi-shot prompting attacks provide the AI model with many examples, like question-answer pairs. These examples are designed to push the model towards unsafe completions. Individually, the examples might seem harmless. But together, they pressure the model to continue those patterns into restricted content.





