32K Context Hits IBM's Open Multilingual Embedding R2 Models
IBM released Granite Embedding Multilingual R2, upgrading its Apache 2.0 encoder models with a 32,768-token context window and ModernBERT architecture.
On May 14, 2026, IBM Research launched the Granite Embedding Multilingual R2 family of open-source encoder models. Licensed under Apache 2.0, this release expands the context window to 32,768 tokens—a 64x increase over the 512-token limit of the R1 generation. The models are designed for enterprise-scale dense retrieval across more than 200 languages and include enhanced support for programming languages like Python, Java, Go, and SQL.
Architecture and Inference Footprint
The R2 generation shifts to the ModernBERT architecture. This foundation leverages alternating attention lengths, Rotary Position Embeddings (RoPE), and Flash Attention 2.0 to optimize sequence processing across long contexts. By handling larger document chunks natively, the architecture simplifies the preprocessing pipelines required for complex retrieval-augmented generation systems.
Despite the expanded 32K context window, the models maintain processing speeds suitable for high-volume indexing. The compact 97M variant processes approximately 2,900 documents per second on a single NVIDIA H100 GPU. Both model weights are published alongside ONNX and OpenVINO formats to support flexible AI inference deployments across GPU clusters, CPUs, and edge hardware.
Model Variants and Benchmark Results
The release splits into two primary bi-encoder models designed for different infrastructure constraints.
| Model | Parameters | Base Dimension | MTEB-v2 Retrieval Score |
|---|---|---|---|
| granite-embedding-311m-multilingual-r2 | 311M | 768 | 65.2 |
| granite-embedding-97m-multilingual-r2 | 97M | 384 | 60.3 |
The full-size 311M model utilizes Matryoshka Representation Learning (MRL). This allows developers to truncate the default 768-dimension embeddings down to 512, 384, 256, or 128 dimensions with minimal accuracy loss, giving engineering teams a dial to control vector database storage costs. The 65.2 score on the MTEB-v2 Retrieval benchmark places it in the top 3 of open multilingual models under 500M parameters.
The 97M compact variant achieves its footprint through aggressive layer pruning, reducing the original 22 layers to 12. IBM also compressed the vocabulary selection from 262K to 180K tokens before applying knowledge distillation. Its 60.3 retrieval score establishes a 9-point lead over competing open multilingual models in the sub-100M parameter class.
If your ingestion pipeline currently shreds large technical documents or mixed-language codebases into aggressive 512-token chunks, the 32K context window allows you to embed entire source files and architectural specs intact. Test the 97M model first for standard multilingual workflows; its knowledge-distilled architecture offers the optimal performance-to-cost ratio unless your specific dataset requires the higher-dimensional precision of the 311M variant.
Get Insanely Good at AI
The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.
Keep Reading
Train Multimodal Sentence Transformers for Visual Retrieval
Learn how to finetune multimodal embedding and reranker models for text, image, and audio using the updated Sentence Transformers library.
Private Evaluation Track Deters Open ASR Benchmaxxing
Hugging Face partnered with Appen and DataoceanAI to introduce a private evaluation track to the Open ASR Leaderboard, mitigating test-set contamination.
Outpacing Whisper: Cohere Transcribe Hits Top ASR Speed
Experience enterprise-grade audio intelligence with Cohere Transcribe, a new open-weights model topping the ASR leaderboard with 3x faster speeds than Whisper.
How to Use Multimodal Sentence Transformers v5.4
Learn to implement multimodal embedding and reranker models using Sentence Transformers for advanced search across text, images, audio, and video.
GLM-5.1 MoE Beats GPT-5.4 in Open-Source Engineering Milestone
Zhipu AI releases GLM-5.1 under MIT license, a 744B parameter MoE model that outperforms GPT-5.4 on the SWE-Bench Pro software engineering benchmark.