OpenAI Ships Teen Safety Policies for gpt-oss-safeguard
OpenAI’s Teen Safety Policy Pack gives developers prompt-based policies and validation data to build safer teen AI moderation workflows.
OpenAI released a Teen Safety Policy Pack on March 24, alongside guidance for using it with gpt-oss-safeguard. For developers shipping AI products that may be used by teens, the release matters because it turns youth-safety policy into something operational: prompt-based classifiers, paired validation datasets, and a workflow you can test before deployment.
The pack is part of OpenAI’s teen safety release for developers, not a new consumer ChatGPT feature. It is designed for moderation pipelines, including real-time filtering and offline review of user-generated content.
Policy pack contents
The initial release covers six policy areas:
| Policy area | Repo slug | Validation dataset |
|---|---|---|
| Graphic violent content | graphic-violent-content | graphic-violent-content.csv |
| Graphic sexual content | graphic-sexual-content | graphic-sexual-content.csv |
| Harmful body ideals and behaviors | harmful-body-ideals | harmful-body-ideals.csv |
| Dangerous activities and challenges | dangerous-content | dangerous-content.csv |
| Romantic or violent roleplay | dangerous-roleplay | dangerous-roleplay.csv |
| Age-restricted goods and services | age-restricted-goods-and-services | age-restricted-goods-and-services.csv |
The policies are published in the teen-safety-policy-pack repository under Apache-2.0 terms, with separate usage-policy constraints. OpenAI says the prompts are built for gpt-oss-safeguard and can also be used with other reasoning models.
The technical move
The important release artifact is the combination of policy prompts and validation CSVs. This is a different product shape from a fixed moderation endpoint.
If you build trust and safety systems, you can treat each policy as a versioned classifier specification. You select a prompt from example_policies/, run it with content through gpt-oss-safeguard, map the output into filtering or human review, then regression-test changes against the matching dataset before shipping.
This pushes safety work closer to standard prompt engineering and evals practice. The pack effectively gives you a reusable moderation layer that can be tuned for your product context, similar to how teams already think about system prompts and evaluating AI output, but aimed at youth safety rather than general model behavior.
gpt-oss-safeguard fit
The pack is tied to two open-weight safety models:
| Model | Base model |
|---|---|
| gpt-oss-safeguard-120b | openai/gpt-oss |
| gpt-oss-safeguard-20b | openai/gpt-oss-20b |
OpenAI points developers to both variants. On Hugging Face, openai/gpt-oss-safeguard-20b is listed as based on openai/gpt-oss-20b, with 42,069 downloads in the last month and 85 Spaces using it. Those numbers matter because they indicate the policy pack is landing on top of an already-used open safety model, not an isolated repo with no downstream path.
For teams running open models in controlled environments, this also fits a broader shift toward self-managed safety infrastructure. If you already run local or private inference, the operational pattern is closer to the workflows used in running LLMs locally and building evaluation gates around them than to calling a black-box moderation API.
Release cadence and repo activity
The launch was accompanied by visible repo updates on March 23 and March 24. OpenAI synced all six policy documents with source documents, then updated datasets, including replacing the graphic sexual and graphic violent datasets.
For developers, this is a useful signal. The policy pack should be treated as a living artifact, not a one-time download. If you fork it, pin specific prompt and dataset revisions. If you customize prompts, keep a regression set and rerun it whenever upstream policy text or labels change. This is the same discipline that applies to testing AI agents, except the object under test is a moderation policy.
Scope in production
OpenAI positions the pack as a starting point. Developers are expected to adapt prompts to their audience, product design, and risk tolerance, then combine them with monitoring, transparency, user controls, and age-appropriate responses.
That guidance matters because prompt-based classifiers are highly context-sensitive. A social app, tutoring product, roleplay app, and game chat system can all need different thresholds and escalation paths for the same category. The value of this release is not that it settles teen safety policy for every product. The value is that it gives you a concrete baseline you can tune and validate.
If your product has teen exposure, take the six policy areas, map each one to a moderation action, and run the provided CSVs before changing a single prompt in production.
Get Insanely Good at AI
The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.
Keep Reading
How to Choose Between GPT-5.4 Mini and Nano for Coding Agents and High-Volume API Tasks
Learn when to use GPT-5.4 mini vs nano for coding, tool use, subagents, and cost-sensitive API workflows.
OpenAI's New Bounty Targets Prompt Injection and Agent Abuse
OpenAI’s public Safety Bug Bounty rewards reports on agentic abuse, prompt injection, data exfiltration, and account integrity risks.
OpenAI Details Internal Coding Agent Monitoring
OpenAI disclosed a live system that monitors internal coding agents’ full traces, flagging about 1,000 moderate-severity cases over five months.
OpenAI has Shut Down Sora and a Billion-Dollar Disney Deal
OpenAI is shutting down Sora, calling it a 'side quest.' The framing tells you where AI companies think the real value is.
ChatGPT Shopping Gets Visual Browsing and Product Comparisons
OpenAI rolled out richer shopping in ChatGPT with visual browsing, product comparisons, and an expanded commerce protocol for discovery.