NVIDIA Releases Nemotron 3 Content Safety 4B
NVIDIA released Nemotron 3 Content Safety 4B, a multilingual multimodal moderation model for text and images, on Hugging Face.
NVIDIA released Nemotron 3 Content Safety 4B, a new open moderation model for classifying text, images, or mixed text-image inputs as safe or unsafe across 12 languages. The Nemotron 3 Content Safety release matters if you build multimodal agents, because moderation now has to cover screenshots, PDFs, memes, mobile photos, and image-embedded text, not just plain chat prompts.
The model is published as nvidia/Nemotron-3-Content-Safety on Hugging Face, with a March 16 release date in the model card and a March 20 public write-up. NVIDIA positions it as a guard model for LLM and VLM pipelines, especially agent workflows that accept user uploads and produce tool-augmented responses.
Model Architecture
Nemotron 3 Content Safety is built from Gemma-3-4B-it, fine-tuned with LoRA and merged back into the base model. It is a 4B-parameter, decoder-only Transformer with a SigLIP vision encoder and a maximum context window of 128K tokens.
For multimodal moderation, those details matter. A 4B model is small enough to fit into tighter deployment budgets, while 128K context gives you room to classify long conversations, attached OCR text, and policy-heavy system context in one pass. If you already think in terms of context engineering, this is the moderation-layer version of the same problem.
The vision stack takes square images resized to 896 x 896. Input modalities are text and image.
Output Format
Moderation output supports two modes. The default path is low-latency safe/unsafe classification. An optional richer mode adds violated safety categories.
The structured text output includes:
User SafetyResponse SafetySafety Categories
NVIDIA uses a 23-category taxonomy aligned with its Aegis content safety schema, including violence, sexual content, harassment, threat, PII/privacy, fraud/deception, malware, political/misinformation/conspiracy, unauthorized advice, and illegal activity.
This is a useful split for agent builders. You can run a fast binary gate in the hot path, then trigger category output only when you need policy routing, audit logging, or downstream enforcement. If your stack already depends on structured output and post-processing rules, category-rich moderation is much easier to operationalize than free-form refusal text.
Multilingual and Multimodal Coverage
Language support includes English, Arabic, German, Spanish, French, Hindi, Japanese, Thai, Dutch, Italian, Korean, and Chinese. NVIDIA also reports zero-shot generalization to additional languages including Portuguese, Swedish, Russian, Czech, Polish, and Bengali.
The more important shift is modality. This model is designed for content that arrives as text plus screenshots, scanned documents, diagrams, memes, and photos. For agent products, that closes a gap that text-only guard models leave open. A system that can safely moderate chat input but cannot inspect a screenshot upload is incomplete.
That also aligns with the direction of agent UX. As more products move from pure chat into computer use and multimodal workflows, guardrails have to sit beside the main model in every tool loop. The same pressure shows up in work on evaluating agents and AI agents versus chatbots.
Training Data and Tuning
Training data combines multilingual safety data from Nemotron-Safety-Guard-Dataset-v3, human-annotated multimodal English safety data translated into multiple languages, safe multimodal data from Nemotron-VLM-Dataset-v2, and synthetic data.
The model’s training set is about 86K samples, drawing from the much larger Nemotron-Safety-Guard-Dataset-v3 (~515K rows total) and other sources. Synthetic data accounts for roughly 10% of the training blend.
NVIDIA translated English-only text data into the 12 supported languages. Around 25% of training samples had categories removed along with a /no_categories toggle, so the model learns when not to emit category labels.
Fine-tuning used LoRA with a grid search over learning rates 1e-5, 1e-4, 5e-5, 5e-6, 1e-7 and LoRA ranks 16, 32. The final run used 5 epochs, 0.0001 learning rate, rank 16, and alpha 32.
Benchmark Positioning
Performance claims are framed around multimodal harmful-content classification. NVIDIA reports 84% average accuracy across Polyguard, RTP-LX, VLGuard, MM SafetyBench, and Figstep.
It also claims roughly half the latency of larger multimodal safety models across mean, median, and P99 measurements, plus deployment feasibility on 8GB+ VRAM GPUs.
NVIDIA does not publish raw latency tables or a separate technical report alongside this release, so the practical takeaway is straightforward: the headline is credible enough to warrant evaluation, but you should benchmark it against your own image sizes, prompt templates, and policy thresholds before replacing an existing moderation tier. This is the same discipline you would apply to any LLM observability setup.
Deployment Stack
Inference support includes Transformers and vLLM. The model card lists Transformers 4.57.1, vLLM >= 0.11.0, PyTorch 2.8.0, and Linux. NVIDIA lists compatibility with NVIDIA RTX PRO 6000 BSE, H100, and A100.
NVIDIA also says the model will be available as an NVIDIA NIM microservice in April 2026, giving developers a pre-packaged, GPU-optimized inference service for production moderation.
If you are deploying multimodal agents today, the immediate move is to test Nemotron 3 Content Safety as a front-door and tool-output classifier, then decide whether binary-only mode is enough for the hot path or whether category output gives you better routing, logging, and enforcement.
Get Insanely Good at AI
The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.
Keep Reading
How to Run NVIDIA Nemotron 3 Nano 4B Locally on Jetson and RTX
Learn to deploy NVIDIA's Nemotron 3 Nano 4B locally with BF16, FP8, or GGUF on Jetson, RTX, vLLM, TensorRT-LLM, and llama.cpp.
NVIDIA Launches Nemotron Coalition at GTC 2026
NVIDIA launched the Nemotron Coalition and expanded its open AI model lineup at GTC 2026, with the first coalition model set for Nemotron 4.
H Company Releases Holotron-12B Computer-Use Agent on Hugging Face
H Company released Holotron-12B, a Nemotron-based multimodal computer-use model touting higher throughput and 80.5% on WebVoyager.
NVIDIA Unveils NemoClaw at GTC as a Security-Focused Enterprise AI Agent Platform
NVIDIA introduced NemoClaw, an alpha open-source enterprise agent platform built to add security and privacy controls to OpenClaw workflows.
NVIDIA Introduces SPEED-Bench for Speculative Decoding
NVIDIA rolled out SPEED-Bench, a benchmark suite and dataset for evaluating speculative decoding across realistic LLM workloads.