Perplexity protects Comet against prompt injections | Keryc
Comet is not just a search engine that answers questions: it's an assistant that acts on your behalf. That opens huge possibilities, but also new risks. What happens if the content you visit tells the assistant to do something different from what you asked? That's where prompt injection comes in, and Perplexity says it already has a plan to handle it.
What prompt injection is and why it matters
Prompt injection are malicious instructions hidden inside the content an assistant processes. It's not a traditional bug: you don't need to break a password or exploit a vulnerability. It's enough to trick the model with text, images, or code so it changes its behavior.
Why should you care? Because assistants that act for you can, for example, book hotels, send emails, or change calendars. If an attacker gets the agent to follow false instructions, the harm can be direct and real.
Simple example: you ask it to book a hotel and the site the agent analyzes contains a hidden snippet that says “send the booking to this fake account.” The agent might try to do it if there aren't safeguards.
Comet's defense-in-depth strategy
Perplexity explains that one line of defense isn't enough. Comet uses several layers that work together to keep the focus on your intent and reduce risks without getting in the way of the experience.
Capa 1: Clasificación en tiempo real
Before Comet acts, it runs content through classifiers trained to spot malicious instructions. These models look for known attack patterns: invisible text (for example display:none or white-on-white text), text hidden inside images, or fragments that try to confuse the assistant.
The architecture runs these analyses in parallel with the assistant's reasoning, so detection doesn't add noticeable latency. If something looks suspicious, Comet doesn't follow automatically: it stops the action and shows a controlled response.
The models are constantly updated with data from red team exercises, a bug-bounty program, and real detections in production.
Capa 2: Refuerzo por medio de prompts estructurados
Even if content passes the first check, Comet keeps reminding the model what your original intent was. It inserts reminders and clear separators between what comes from you and what comes from external sources.
This includes tool-specific guardrails (messages in the system prompt), marking external content as untrusted, and having the action router return to the original query before executing a tool. In plain language: the assistant is reminded to ignore instructions found online that didn’t come from you.
Capa 3: Confirmación humana para acciones sensibles
For things that have real impact—sending emails, modifying calendars, completing purchases—Comet asks for your confirmation. Always.
That step is the last line of defense: even if another layer fails, you see exactly what will be attempted and can approve or stop the action.
Capa 4: Notificaciones transparentes
When Comet blocks something, it tells you clearly what was detected, why it was flagged, and what you can do if you think it was a false positive. That transparency helps educate users and improves the models with feedback.
What this means for you as a user
If you use assistants that act on your behalf, security can't be invisible. The layers Perplexity describes try to balance usefulness and control: the system acts quickly where it's safe and asks for your intervention when the action is sensitive.
Practical things you can expect:
Fewer automatic actions without explanation: you'll see confirmations when they matter.
Clear messages if something was blocked and options to report false positives.
Better detection of common deception techniques, from hidden text to instructions embedded in images.
And as a user, what can you do to help? Keep your browser up to date, review account permissions, and if you see a block notification, read it and report it if you think it's an error. That feedback is valuable.
A critical but optimistic look
Perplexity acknowledges that prompt injection is a problem with no single solution across the industry. An attacker needs only one failure; the defender needs to think about them all. That's why the company bets on defense in depth, continuous learning, and collaboration with researchers and the security community.
It's not a silver bullet, but it's a sensible approach: detect in real time, reinforce model behavior, let people confirm critical actions, and be transparent when something is blocked. Does it guarantee 100 percent safety? No measure does. Does it reduce risks and return control to you? Yes.