BrowseSafe: new defense for browsers with AI | Keryc
Perplexity introduces BrowseSafe, an open detector and benchmark to stop browser assistants from following malicious instructions hidden inside web pages. What does this mean for you as a user or developer? Basically, fewer surprises when an agent reads everything on a page and someone tries to manipulate it from the content itself.
What BrowseSafe is and why it matters
BrowseSafe is a detection model fine-tuned for a single, concrete question: given a page's HTML, does it contain malicious instructions aimed at the agent? In practice this means that before the assistant reads or acts on content, BrowseSafe scans the page and flags what looks potentially dangerous.
Why not use a generic large model for this? Because those are often slow and expensive to run on every page in real time. BrowseSafe is designed to scan whole pages without slowing down the browser, and it ships with BrowseSafe-Bench, a public test suite of more than 14,700 real examples to evaluate and improve defenses.
How prompt injections work in the browser
The idea is simple and dangerous: attackers hide instructions where humans don't look, but agents do. HTML comments, data attributes, invisible form fields or even visible sections like footers can contain commands meant to steer the assistant off course.
Those malicious instructions can be blunt or highly camouflaged: indirect hints, hypothetical prompts, or written in other languages. The risk grows because agents tend to process all the HTML, not just what’s visible on screen.
Attackers take advantage of agents 'reading' more than we see. That's why you should scan the HTML with contextual awareness.
BrowseSafe-Bench: tests with real-world pages
BrowseSafe-Bench is the public testbed: 14,719 examples that mimic production pages with noisy content and a variety of malicious cases. The benchmark covers 11 attack types, 9 injection strategies and 3 language styles. In other words, it’s not a clean lab: it’s the chaos you actually find on the web.
Results show clear patterns: direct attacks (asking to exfiltrate data or reveal the system prompt) are easier to catch. Versions in other languages and indirect instructions are much harder because they avoid obvious keywords. It also matters where the instruction sits: rewrites embedded in visible paragraphs are tougher than things hidden in comments.
Defense in depth: there’s no single silver bullet
In Perplexity's threat model, the assistant runs in a trusted environment but everything coming from the web is untrusted. BrowseSafe is one layer of that defense: raw outputs are scanned before the agent reads them, permissions are restricted by default and explicit confirmation is required for sensitive actions.
The idea is simple: combine multiple barriers so the assistant's power doesn’t come at the user's expense. Tools like Perplexity's help keep assistant-enabled browsers useful and safer at the same time.
What can a developer do today?
BrowseSafe and BrowseSafe-Bench are open source. That means any developer building autonomous agents can start hardening their system without starting from scratch. The detection model can run locally and is optimized to flag malicious instructions before they reach the agent's core logic.
Also, BrowseSafe-Bench works as a stress test: use its 14,000+ scenarios to see how your system reacts to messy HTML and common traps. Perplexity also shares chunking and parallel scanning techniques to process large pages efficiently.
Looking ahead
The shift from search to agent-enabled browsers changes the rules: now it matters not just what’s on a page, but who uses it and how an assistant interprets it. BrowseSafe is a practical step to reduce prompt injection risk and keep the web from becoming a hunting ground for attackers.
It’s not a total solution, but it’s a real, usable tool: client-run models, open benchmarks and defense-in-depth practices. If you work with browser agents, now is a good time to integrate HTML scanning and test your system against real scenarios.