AI Governance

Prompt injection is the SQL injection of 2026. Same bug, different runtime.

The model cannot tell instructions from data. It reads input and follows it, including instructions no human ever saw, hidden in a document, a web page, or a tool result. Swiftward defends against it out of the box, and proves what it caught.

Why detection alone is not enough

The attacker's tool is more capable than your detector. The model you are protecting has billions of parameters; the classifier guarding it has far fewer, and it can be slipped past with encodings and homoglyphs a human would never write. So input scanning alone will never be enough. What stops real damage is what happens after detection: bounded tool permissions, parameter limits, and deny-by-default on the actions that matter.

The layered pipeline, out of the box

On the way in, the gateway first normalizes the text to strip the tricks that fool a classifier: invisible characters, right-to-left overrides, look-alike homoglyphs. Then high-confidence patterns catch the blunt attacks, instruction overrides, system-prompt manipulation, role-play jailbreaks like "do anything now", and they do it across languages, not only English. A cheap gate decides when an attempt is worth the cost of the machine-learning model, so the expensive check runs only when it should; the model is a specialized open-source classifier, Llama Prompt Guard 2. It blocks on a hard pattern match, or when the gate and the model agree.

It also scans what you did not type: the content that comes back from a tool call or a retrieved document, where injected instructions like to hide. This ships as a default policy, not a project you build, and when a new bypass appears you change the rules in minutes, because policy is configuration. You can add your own patterns, or call any external detector you already trust.

Jailbreaks and social engineering

Not every attack looks technical. Some are social: a user who befriends the agent over many turns, invents an emergency, or claims they will be harmed unless it breaks a rule or hands over data it should not. The same layered checks catch the blunt versions by their wording. For the subtler ones you write a policy for what your agent must never be talked into, and the engine runs a cheap classifier first and a slower LLM-as-judge only when it needs to, to recognize when the agent is being manipulated and refuse. It is the same machinery that keeps your AI on-policy in general.

Honest defense

"100% safe" is a lie, and you should distrust anyone who sells it. You do not claim your firewall stops everything either. The real claim is the one that holds up: trace every decision, backtest new rules against past traffic, and shadow-test against live traffic before you enforce. Then, when something does get through, you can show exactly what happened and why.

Book a demo