Ai Engineering 5 min read

IBM Granite Releases Mellea 0.4.0 Libraries

IBM Granite announced Mellea 0.4.0 and three LoRA-based libraries for RAG, validation, and safety on granite-4.0-micro.

IBM Granite shipped Mellea 0.4.0 and three new Granite Libraries for ibm-granite/granite-4.0-micro on March 20. For developers building structured AI systems, the release matters because it moves common pipeline tasks such as RAG validation, attribution, uncertainty scoring, and safety checks out of prompt templates and into callable LoRA adapters integrated directly into Mellea workflows.

The immediate change is architectural. IBM is packaging narrow LLM operations as reusable components instead of relying on a single general model plus more prompting. If you build RAG systems, agent pipelines, or safety gates, this gives you a more explicit way to compose retrieval, validation, and repair steps around a 3B base model with 128K context.

Release Scope

The March 20 release combines one library update and three adapter collections:

ComponentScopeKey additions
Mellea 0.4.0Python library for generative programsGranite Library integration, rejection-sampling repair flows, event-driven observability hooks
granitelib-rag-r1.0Agentic RAG adapters6 adapters for retrieval and answer validation
granitelib-core-r1.0Verification and explainability adapters3 adapters for attribution, requirement checks, and uncertainty
granitelib-guardian-r1.0Safety and factuality adapters4 capabilities for guardrails, factuality, and policy checks

All three Granite Libraries target ibm-granite/granite-4.0-micro, IBM’s 3B parameter decoder-only dense transformer with a 128K sequence length.

Adapter Design

Each library breaks a larger application concern into smaller typed operations.

granitelib-rag-r1.0 includes six adapters: Query Rewrite, Query Clarification, Context Relevance, Answerability Determination, Hallucination Detection, and Citation Generation. IBM positions these across pre-retrieval, pre-generation, and post-generation stages. This lines up with the broader shift from monolithic RAG to multi-stage retrieval pipelines, similar to the design pressure behind agentic retrieval systems.

granitelib-core-r1.0 is the verification layer. It includes Context Attribution, Requirement Check, and Uncertainty. The most important detail is that uncertainty returns a calibrated certainty percentage, where answers assigned X percent are intended to be correct about X percent of the time. If you already use LLM evaluation, this gives you a runtime signal that can feed routing, retry, or human-review thresholds instead of serving only as an offline metric.

granitelib-guardian-r1.0 covers Guardian Core, Factuality Detection, Factuality Correction, and Policy Guardrails. Guardian Core evaluates prompts and responses for risks including safety issues, jailbreaking, profanity, violence, sexual content, social bias, unethical behavior, tool-call hallucinations, and RAG-related risks. Outputs are structured JSON, and Policy Guardrails can return an additional “Ambiguous” state.

Mellea 0.4.0 Changes the Execution Model

The library release would be less useful without Mellea 0.4.0. The new version adds native support for these adapters as first-class intrinsics inside structured workflows.

Several additions stand out:

  • Guardianlib intrinsics
  • find_context_attributions()
  • requirement_check and uncertainty as core intrinsics
  • hook system and plugin support
  • OTLP logging export
  • OpenTelemetry metrics support
  • configurable OTLP and Prometheus exporters
  • token usage metrics

This pushes Mellea closer to an application runtime than a thin orchestration wrapper. Type hints become schemas, requirements can be checked before output leaves a session, and failed checks can trigger repair attempts through rejection sampling. If your team is already working on structured outputs or LLM observability, this release connects both concerns in one stack.

Practical Tradeoffs

The release is narrowly scoped, which is also the point. These libraries are built only for granite-4.0-micro, not as general adapters across arbitrary base models. If you are standardizing on Granite, the integration is cleaner. If your stack spans multiple vendors, you are choosing a more model-specific workflow abstraction.

The adapter counts are also a useful signal about intent:

LibraryAdapter / capability count
granitelib-rag-r1.06
granitelib-core-r1.03
granitelib-guardian-r1.04

IBM is defining a catalog of narrowly bounded operations rather than a broad agent framework. This complements orchestration layers more than it replaces them. If you compare agent stacks regularly, this fits closer to specialized skills than to end-to-end frameworks, much like the distinction between agent skills and orchestration rules.

Deployment Implications

The base model choice matters. granite-4.0-micro is a 3B model with 128K context, which keeps the footprint smaller than frontier-scale alternatives while still supporting long-context enterprise workflows. The RAG library page lists 14.4M parameters, reflecting the lightweight adapter approach.

For production teams, the real value is operational control. You can separate retrieval rewriting from answerability checks, separate generation from factuality correction, and separate response production from policy gating. Each step can be monitored with telemetry, scored independently, and retried selectively. This is a better fit for systems where context engineering and compliance matter more than squeezing every task through one prompt.

If you run Granite models in production, the next step is straightforward: map your current prompt chain to explicit validation points, then replace the highest-risk stages, usually answerability, hallucination checks, and policy review, with adapter-backed intrinsics you can observe and enforce.

Get Insanely Good at AI

Get Insanely Good at AI

The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.

Keep Reading