OpenAI Ships Teen Safety Policies for gpt-oss-safeguard

OpenAI released a Teen Safety Policy Pack on March 24, alongside guidance for using it with gpt-oss-safeguard. For developers shipping AI products that may be used by teens, the release matters because it turns youth-safety policy into something operational: prompt-based classifiers, paired validation datasets, and a workflow you can test before deployment.

The pack is part of OpenAI’s teen safety release for developers, not a new consumer ChatGPT feature. It is designed for moderation pipelines, including real-time filtering and offline review of user-generated content.

Policy pack contents

The initial release covers six policy areas:

Policy area	Repo slug	Validation dataset
Graphic violent content	`graphic-violent-content`	`graphic-violent-content.csv`
Graphic sexual content	`graphic-sexual-content`	`graphic-sexual-content.csv`
Harmful body ideals and behaviors	`harmful-body-ideals`	`harmful-body-ideals.csv`
Dangerous activities and challenges	`dangerous-content`	`dangerous-content.csv`
Romantic or violent roleplay	`dangerous-roleplay`	`dangerous-roleplay.csv`
Age-restricted goods and services	`age-restricted-goods-and-services`	`age-restricted-goods-and-services.csv`

The policies are published in the teen-safety-policy-pack repository under Apache-2.0 terms, with separate usage-policy constraints. OpenAI says the prompts are built for gpt-oss-safeguard and can also be used with other reasoning models.

The technical move

The important release artifact is the combination of policy prompts and validation CSVs. This is a different product shape from a fixed moderation endpoint.

If you build trust and safety systems, you can treat each policy as a versioned classifier specification. You select a prompt from example_policies/, run it with content through gpt-oss-safeguard, map the output into filtering or human review, then regression-test changes against the matching dataset before shipping.

This pushes safety work closer to standard prompt engineering and evals practice. The pack effectively gives you a reusable moderation layer that can be tuned for your product context, similar to how teams already think about system prompts and evaluating AI output, but aimed at youth safety rather than general model behavior.

gpt-oss-safeguard fit

The pack is tied to two open-weight safety models:

Model	Base model
gpt-oss-safeguard-120b	openai/gpt-oss
gpt-oss-safeguard-20b	openai/gpt-oss-20b

OpenAI points developers to both variants. On Hugging Face, openai/gpt-oss-safeguard-20b is listed as based on openai/gpt-oss-20b, with 42,069 downloads in the last month and 85 Spaces using it. Those numbers matter because they indicate the policy pack is landing on top of an already-used open safety model, not an isolated repo with no downstream path.

For teams running open models in controlled environments, this also fits a broader shift toward self-managed safety infrastructure. If you already run local or private inference, the operational pattern is closer to the workflows used in running LLMs locally and building evaluation gates around them than to calling a black-box moderation API.

Release cadence and repo activity

The launch was accompanied by visible repo updates on March 23 and March 24. OpenAI synced all six policy documents with source documents, then updated datasets, including replacing the graphic sexual and graphic violent datasets.

For developers, this is a useful signal. The policy pack should be treated as a living artifact, not a one-time download. If you fork it, pin specific prompt and dataset revisions. If you customize prompts, keep a regression set and rerun it whenever upstream policy text or labels change. This is the same discipline that applies to testing AI agents, except the object under test is a moderation policy.

Scope in production

OpenAI positions the pack as a starting point. Developers are expected to adapt prompts to their audience, product design, and risk tolerance, then combine them with monitoring, transparency, user controls, and age-appropriate responses.

That guidance matters because prompt-based classifiers are highly context-sensitive. A social app, tutoring product, roleplay app, and game chat system can all need different thresholds and escalation paths for the same category. The value of this release is not that it settles teen safety policy for every product. The value is that it gives you a concrete baseline you can tune and validate.

If your product has teen exposure, take the six policy areas, map each one to a moderation action, and run the provided CSVs before changing a single prompt in production.

OpenAI Ships Teen Safety Policies for gpt-oss-safeguard

Policy pack contents

The technical move

gpt-oss-safeguard fit

Release cadence and repo activity

Scope in production

Keep Reading

How to Choose Between GPT-5.4 Mini and Nano for Coding Agents and High-Volume API Tasks

Moonbounce Secures $12M to Automate AI Content Moderation

OpenAI's New Bounty Targets Prompt Injection and Agent Abuse

OpenAI Details Internal Coding Agent Monitoring

Multi-Turn Attacks Erode Safety Guardrails in 15 AI Models