AI finds critical bugs in Firefox in partnership with Mozilla | Keryc
Anthropic and Mozilla worked together to test how well an AI can find serious bugs in a modern browser. The experiment used Claude Opus 4.6 and resulted in reports that helped fix issues in Firefox 148.0 for hundreds of millions of users.
What happened
In a short experiment, Claude Opus 4.6 identified 22 vulnerabilities in Firefox; Mozilla determined that 14 of them were high severity. To put it in perspective: those 14 represent almost a fifth of all high-severity bugs Firefox remedied in 2025.
First, Claude reproduced many historical CVEs in older versions of Firefox’s code. Then it was asked to look for new bugs in the current version: it started with the JavaScript engine and later expanded to other areas. After barely twenty minutes of exploration, it reported a Use After Free, a critical memory vulnerability.
The team validated the finding in virtual machines, prepared a report with a possible fix (the patch was proposed by Claude and reviewed by humans) and submitted it to Bugzilla, Mozilla’s tracker. In total they scanned about 6,000 C++ files and sent 112 unique reports; most were fixed in Firefox 148.0 and the rest will be addressed in future releases.
How Claude and Mozilla collaborated
The workflow was practical: Anthropic generated many reports and Mozilla helped decide what was worth sending officially. Instead of validating every single case, Mozilla encouraged submitting findings in bulk, which sped up triage.
Mozilla was also transparent about triage and testing, which made it possible to tune the workflow to reduce false positives. As a result, Mozilla researchers even started experimenting internally with Claude.
The collaboration shows an operational model: AI that finds bugs and maintainers who prioritize, verify, and patch.
Can AI exploit them?
Anthropic didn’t just ask Claude to find bugs; they gave it access to the findings and asked it to try to turn them into real exploits. What was the minimal goal to demonstrate an exploit? Read and write a local file on the target system.
Spending roughly $4,000 on tests, Claude managed to turn the vulnerability into a working exploit in only two cases. That leaves two clear conclusions: Claude is better at finding vulnerabilities than at exploiting them, and developing exploits costs far more than identifying faults.
Also, the exploits that worked were rudimentary and only operated in a test environment where real protections like the sandbox had been disabled. Firefox has a defense-in-depth strategy that reduces risk in real scenarios, although it isn’t foolproof.
Practical advice for maintainers and security teams
If you maintain software, this is an invitation to improve processes. A few useful practices that came out of this experience:
Use task verifiers (task verifiers) that automatically confirm whether a fix removes the vulnerability and preserves functionality.
Test proposed patches with automated tests to catch regressions.
Include clear evidence when filing reports: minimal test cases, detailed proofs of concept, and candidate patches.
Back the process with a Coordinated Vulnerability Disclosure (CVD) protocol to work with external researchers without putting users at risk.
These measures make it easier for maintainers to trust reports generated by AI tools and speed up fixing real issues.
What now?
The picture is clear: frontier models are already high-level vulnerability researchers. Anthropic used Opus 4.6 in other major projects too, like the Linux kernel, and is announcing tools to bring these capabilities to maintainers and customers.
Today, defenders have the advantage: finding and patching is still easier than building fully functional exploits. But that gap can close quickly. What does this mean for you as a developer or product owner? It’s time to strengthen security practices, automate checks, and collaborate with researchers.
If you want to get involved, Anthropic offers initiatives and calls to expand the search and fixing of vulnerabilities in open source code.
The news is both opportunity and warning: AI speeds up detection within a useful window for defenders, but it demands we modernize processes and be proactive before offensive capabilities catch up.