Ai Engineering 3 min read

8K Context Reranking Hits Hugging Face With Ettin Cross-Encoders

Hugging Face released six open-source cross-encoders under the Ettin Reranker family with an 8,192-token context window for long-form document retrieval.

Hugging Face introduced the Ettin Reranker Family, a suite of six open-source cross-encoder models that support an 8,192-token context window. Developed by Tom Aarsen and built on Ettin ModernBERT encoders from Johns Hopkins University, the models allow developers to evaluate entire long-form documents rather than fragmented text chunks. This significantly shifts the boundaries of what is possible when building RAG applications in production.

Most existing rerankers truncate context at 512 tokens. Forcing long texts into narrow windows often removes the semantic context necessary for accurate sorting. The Ettin family removes this constraint entirely.

Scaling Efficiency Across Parameters

The Ettin release includes six distinct parameter weights tailored for specific latency and hardware constraints. All models are available under the Apache 2.0 license.

Model NameParametersPerformance Highlight
ettin-reranker-17m-v117MAchieves 7,517 pairs/sec throughput.
ettin-reranker-32m-v132MOptimized for low-memory local RAG.
ettin-reranker-68m-v168MMatches Qwen3-Reranker-0.6B capacity.
ettin-reranker-150m-v1150MOutperforms Qwen3-Reranker-0.6B on MTEB.
ettin-reranker-400m-v1400MCompetes with standard 1.5B parameter models.
ettin-reranker-1b-v11BMirrors mxbai-rerank-large-v2 performance.

The 17M baseline model processes data significantly faster than traditional models. It hits 7,517 pairs per second, nearly double the throughput of ms-marco-MiniLM-L6-v2 at 3,817 pairs per second. It also provides a 1.7x to 8.3x speedup over standard fp32+SDPA configurations for medium-length sequences.

Architecture and Distillation Techniques

The underlying Ettin ModernBERT architecture utilizes unpadded attention mechanisms, Rotary Positional Encodings (RoPE), and GeGLU activations. During development, ablation studies demonstrated that CLS pooling consistently outperformed mean pooling for these specific encoders.

Hugging Face trained the models using pointwise Mean Squared Error (MSE) distillation. The smaller models learned directly from the scoring logic of mixedbread-ai/mxbai-rerank-large-v2, a 1.54B parameter teacher model. Training data came from the cross-encoder/ettin-reranker-v1-data dataset, which blends subsets of lightonai/embeddings-pre-training and a reranked slice of lightonai/embeddings-fine-tuning.

This distillation approach proved highly effective on the MTEB(eng, v2) Retrieval benchmark. Evaluated across ten diverse tasks, the ettin-reranker-1b-v1 model scored 0.6114 NDCG@10, missing its 1.54B teacher model by only 0.0001 points. It also tracked within 0.008 on the NanoBEIR benchmark.

Tooling and Ecosystem Integration

Dropping new models into a retrieval pipeline requires compatible infrastructure alongside standard embedding models. Hugging Face paired this release with a new train-sentence-transformers Agent Skill built for Sentence Transformers v5.5.0.

This integration exposes a direct fine-tuning path for teams using autonomous coding agents like Claude Code or Cursor. Developers can now instruct their agent workspaces to automatically configure and execute fine-tuning runs on their own custom data.

If you maintain a document retrieval pipeline, benchmark the 17M or 32M models against your current cross-encoder. The speed advantage alone reduces latency at the final RAG stage, while the 8K context limit eliminates the need for complex chunking logic before the reranking step.

Get Insanely Good at AI

Get Insanely Good at AI

The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.

Keep Reading