IBM Granite Releases Mellea 0.4.0 Libraries
IBM Granite announced Mellea 0.4.0 and three LoRA-based libraries for RAG, validation, and safety on granite-4.0-micro.
IBM Granite shipped Mellea 0.4.0 and three new Granite Libraries for ibm-granite/granite-4.0-micro on March 20. For developers building structured AI systems, the release matters because it moves common pipeline tasks such as RAG validation, attribution, uncertainty scoring, and safety checks out of prompt templates and into callable LoRA adapters integrated directly into Mellea workflows.
The immediate change is architectural. IBM is packaging narrow LLM operations as reusable components instead of relying on a single general model plus more prompting. If you build RAG systems, agent pipelines, or safety gates, this gives you a more explicit way to compose retrieval, validation, and repair steps around a 3B base model with 128K context.
Release Scope
The March 20 release combines one library update and three adapter collections:
| Component | Scope | Key additions |
|---|---|---|
| Mellea 0.4.0 | Python library for generative programs | Granite Library integration, rejection-sampling repair flows, event-driven observability hooks |
granitelib-rag-r1.0 | Agentic RAG adapters | 6 adapters for retrieval and answer validation |
granitelib-core-r1.0 | Verification and explainability adapters | 3 adapters for attribution, requirement checks, and uncertainty |
granitelib-guardian-r1.0 | Safety and factuality adapters | 4 capabilities for guardrails, factuality, and policy checks |
All three Granite Libraries target ibm-granite/granite-4.0-micro, IBM’s 3B parameter decoder-only dense transformer with a 128K sequence length.
Adapter Design
Each library breaks a larger application concern into smaller typed operations.
granitelib-rag-r1.0 includes six adapters: Query Rewrite, Query Clarification, Context Relevance, Answerability Determination, Hallucination Detection, and Citation Generation. IBM positions these across pre-retrieval, pre-generation, and post-generation stages. This lines up with the broader shift from monolithic RAG to multi-stage retrieval pipelines, similar to the design pressure behind agentic retrieval systems.
granitelib-core-r1.0 is the verification layer. It includes Context Attribution, Requirement Check, and Uncertainty. The most important detail is that uncertainty returns a calibrated certainty percentage, where answers assigned X percent are intended to be correct about X percent of the time. If you already use LLM evaluation, this gives you a runtime signal that can feed routing, retry, or human-review thresholds instead of serving only as an offline metric.
granitelib-guardian-r1.0 covers Guardian Core, Factuality Detection, Factuality Correction, and Policy Guardrails. Guardian Core evaluates prompts and responses for risks including safety issues, jailbreaking, profanity, violence, sexual content, social bias, unethical behavior, tool-call hallucinations, and RAG-related risks. Outputs are structured JSON, and Policy Guardrails can return an additional “Ambiguous” state.
Mellea 0.4.0 Changes the Execution Model
The library release would be less useful without Mellea 0.4.0. The new version adds native support for these adapters as first-class intrinsics inside structured workflows.
Several additions stand out:
- Guardianlib intrinsics
find_context_attributions()requirement_checkand uncertainty as core intrinsics- hook system and plugin support
- OTLP logging export
- OpenTelemetry metrics support
- configurable OTLP and Prometheus exporters
- token usage metrics
This pushes Mellea closer to an application runtime than a thin orchestration wrapper. Type hints become schemas, requirements can be checked before output leaves a session, and failed checks can trigger repair attempts through rejection sampling. If your team is already working on structured outputs or LLM observability, this release connects both concerns in one stack.
Practical Tradeoffs
The release is narrowly scoped, which is also the point. These libraries are built only for granite-4.0-micro, not as general adapters across arbitrary base models. If you are standardizing on Granite, the integration is cleaner. If your stack spans multiple vendors, you are choosing a more model-specific workflow abstraction.
The adapter counts are also a useful signal about intent:
| Library | Adapter / capability count |
|---|---|
granitelib-rag-r1.0 | 6 |
granitelib-core-r1.0 | 3 |
granitelib-guardian-r1.0 | 4 |
IBM is defining a catalog of narrowly bounded operations rather than a broad agent framework. This complements orchestration layers more than it replaces them. If you compare agent stacks regularly, this fits closer to specialized skills than to end-to-end frameworks, much like the distinction between agent skills and orchestration rules.
Deployment Implications
The base model choice matters. granite-4.0-micro is a 3B model with 128K context, which keeps the footprint smaller than frontier-scale alternatives while still supporting long-context enterprise workflows. The RAG library page lists 14.4M parameters, reflecting the lightweight adapter approach.
For production teams, the real value is operational control. You can separate retrieval rewriting from answerability checks, separate generation from factuality correction, and separate response production from policy gating. Each step can be monitored with telemetry, scored independently, and retried selectively. This is a better fit for systems where context engineering and compliance matter more than squeezing every task through one prompt.
If you run Granite models in production, the next step is straightforward: map your current prompt chain to explicit validation points, then replace the highest-risk stages, usually answerability, hallucination checks, and policy review, with adapter-backed intrinsics you can observe and enforce.
Get Insanely Good at AI
The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.
Keep Reading
How to Build a Domain-Specific Embedding Model
Learn NVIDIA's recipe for fine-tuning a domain-specific embedding model in hours using synthetic data, hard negatives, BEIR, and NIM.
Continued Pretraining vs RAG: Two Ways to Add Knowledge
Continued pretraining bakes knowledge into model weights. RAG injects it at query time. When to use each, where each breaks down, and why you often need both.
LLM Observability: How to Monitor AI Applications
Traditional monitoring doesn't cover LLM applications. Here's what to log, how to trace multi-step chains, and how to detect quality regressions before users do.
How to Run IBM Granite 4.0 1B Speech for Multilingual Edge ASR and Translation
Learn how to deploy IBM Granite 4.0 1B Speech for fast multilingual ASR and translation on edge devices.
Context Engineering: The Most Important AI Skill in 2026
Context engineering is replacing prompt engineering as the critical AI skill. Learn what it is, why it matters more than prompting, and how to manage state, memory, and information flow in AI systems.