OpenAI's New Bounty Targets Prompt Injection and Agent Abuse

OpenAI launched a public Safety Bug Bounty on March 25, 2026, expanding paid vulnerability reporting beyond conventional security bugs into AI-specific abuse and safety failures. The new program, available through OpenAI’s Safety Bug Bounty, matters if you build agents, tool-using assistants, or MCP-connected systems, because it formalizes which failures count as operational safety issues and which do not.

Scope

The program separates safety vulnerabilities from standard security flaws. Conventional access-control issues, unauthorized data exposure, and feature access bugs still belong in OpenAI’s existing security bounty workflow. The new track targets abuse paths where model behavior, agent execution, and platform controls combine into something materially exploitable.

OpenAI named three in-scope categories:

Category	Included examples	Key threshold
Agentic Risks including MCP	Third-party prompt injection, data exfiltration against a victim’s agent, harmful agent actions in Browser, ChatGPT Agent, and similar products	Reproducible at least 50% of the time
OpenAI Proprietary Information	Generations exposing proprietary reasoning-related information, other proprietary information leaks	Must expose protected internal information
Account and Platform Integrity	Bypassing anti-automation controls, manipulating trust signals, evading restrictions, suspensions, or bans	Must affect platform integrity controls

The 50% reproducibility requirement is the most concrete operational bar in the launch. If you report prompt injection or exfiltration issues, you need a failure that happens often enough to be actionable, not a one-off transcript.

Boundary Between Jailbreaks and Exploits

OpenAI explicitly excludes generic jailbreaks unless they create a concrete abuse path with discrete remediation. A content-policy bypass that only produces rude text, or returns information already available through search, does not qualify.

This boundary is important for anyone working on agent evaluation, tool permissions, or prompt-defense stacks. The program rewards exploitability, not novelty alone. A successful report needs to show practical harm, such as exfiltration, disallowed actions at scale, or integrity bypasses.

That framing lines up with current agent security work across the industry. As assistants move from chat into browsing, tool use, and persistent workflows, the risk surface shifts toward cross-context instruction injection, over-broad tool authority, and automation abuse. Recent OpenAI work on prompt-injection defenses for ChatGPT Agent sits directly in that same problem space.

Agentic Systems Are the Center of Gravity

The most revealing part of the scope is the emphasis on agentic risk. OpenAI calls out Browser, ChatGPT Agent, similar agentic products, and MCP by name. This is a strong signal about where abuse review is moving.

If your product uses tool calling, browser automation, long-lived sessions, or external connectors, the relevant failure mode is no longer just “can the model be tricked.” It is “can an attacker cause the system to take actions or reveal data with enough consistency to matter.” That is the same design pressure behind tighter context engineering, narrower tool scopes, and stronger separation between retrieved content and executable instructions.

MCP is especially relevant because it increases the number of third-party surfaces an agent can trust by default. OpenAI requires testers to respect third-party terms when probing MCP-related risk, which suggests the company expects abuse research to involve realistic cross-system interactions rather than isolated prompt experiments. If you need a refresher on that integration layer, see MCP basics.

Relationship to Existing Disclosure Channels

The new bounty complements, rather than replaces, OpenAI’s existing vulnerability intake. OpenAI’s coordinated vulnerability disclosure policy still governs good-faith reporting norms, and its CVE assignment policy already distinguished classic security vulnerabilities from model safety issues such as jailbreaks and hallucinations.

The practical change is that OpenAI now has a public, paid path for AI abuse reports that fall outside confidentiality, integrity, and availability definitions. Reports can also be rerouted between Safety and Security Bug Bounty teams, which reduces the ambiguity researchers often face when a finding spans both model behavior and platform controls.

Comparison With OpenAI’s Earlier Bounty Programs

OpenAI previously ran narrower bio-focused bounty efforts around ChatGPT agent and GPT-5, with challenge-style rewards of $25,000 for the first universal jailbreak clearing all 10 questions and $10,000 for the first team solving them with multiple prompts. The new public program is broader in subject matter and more operational in its scope.

Program	Focus	Public?	Example reward detail
Safety Bug Bounty	AI abuse and safety risks across OpenAI products	Yes	Payout tiers not disclosed in the launch post
Agent bio bug bounty	Bio challenge for ChatGPT agent	Limited campaign	$25,000 / $10,000
GPT-5 bio bug bounty	Bio challenge for GPT-5	Limited campaign	$25,000 / $10,000

OpenAI did not publish payout tiers for the new program in the launch announcement. For developers, the more important detail is the scope definition, because it shows which categories of agent failure are now treated as first-class bounty targets.

If you ship AI agents, audit your system for third-party prompt injection, tool-triggered data exfiltration, and account integrity bypasses before you spend more time on generic jailbreak hardening. Those are the failures that now carry formal bounty incentives, and they are the ones most likely to turn into production incidents.

OpenAI's New Bounty Targets Prompt Injection and Agent Abuse

Scope

Boundary Between Jailbreaks and Exploits

Agentic Systems Are the Center of Gravity

Relationship to Existing Disclosure Channels

Comparison With OpenAI’s Earlier Bounty Programs

Keep Reading

How Function Calling Works in LLMs

OpenAI Releases IH-Challenge Dataset and Reports Stronger Prompt-Injection Robustness in GPT-5 Mini-R

BioShocking Exploit Steals SSH Keys From 6 Agentic Browsers

Protestware in jqwik 1.10.0 Sabotages Vibe Coding Agents

Multi-Turn Attacks Erode Safety Guardrails in 15 AI Models