8K Context Reranking Hits Hugging Face With Ettin Cross-Encoders

Hugging Face introduced the Ettin Reranker Family, a suite of six open-source cross-encoder models that support an 8,192-token context window. Developed by Tom Aarsen and built on Ettin ModernBERT encoders from Johns Hopkins University, the models allow developers to evaluate entire long-form documents rather than fragmented text chunks. This significantly shifts the boundaries of what is possible when building RAG applications in production.

Most existing rerankers truncate context at 512 tokens. Forcing long texts into narrow windows often removes the semantic context necessary for accurate sorting. The Ettin family removes this constraint entirely.

Scaling Efficiency Across Parameters

The Ettin release includes six distinct parameter weights tailored for specific latency and hardware constraints. All models are available under the Apache 2.0 license.

Model Name	Parameters	Performance Highlight
`ettin-reranker-17m-v1`	17M	Achieves 7,517 pairs/sec throughput.
`ettin-reranker-32m-v1`	32M	Optimized for low-memory local RAG.
`ettin-reranker-68m-v1`	68M	Matches Qwen3-Reranker-0.6B capacity.
`ettin-reranker-150m-v1`	150M	Outperforms Qwen3-Reranker-0.6B on MTEB.
`ettin-reranker-400m-v1`	400M	Competes with standard 1.5B parameter models.
`ettin-reranker-1b-v1`	1B	Mirrors `mxbai-rerank-large-v2` performance.

The 17M baseline model processes data significantly faster than traditional models. It hits 7,517 pairs per second, nearly double the throughput of ms-marco-MiniLM-L6-v2 at 3,817 pairs per second. It also provides a 1.7x to 8.3x speedup over standard fp32+SDPA configurations for medium-length sequences.

Architecture and Distillation Techniques

The underlying Ettin ModernBERT architecture utilizes unpadded attention mechanisms, Rotary Positional Encodings (RoPE), and GeGLU activations. During development, ablation studies demonstrated that CLS pooling consistently outperformed mean pooling for these specific encoders.

Hugging Face trained the models using pointwise Mean Squared Error (MSE) distillation. The smaller models learned directly from the scoring logic of mixedbread-ai/mxbai-rerank-large-v2, a 1.54B parameter teacher model. Training data came from the cross-encoder/ettin-reranker-v1-data dataset, which blends subsets of lightonai/embeddings-pre-training and a reranked slice of lightonai/embeddings-fine-tuning.

This distillation approach proved highly effective on the MTEB(eng, v2) Retrieval benchmark. Evaluated across ten diverse tasks, the ettin-reranker-1b-v1 model scored 0.6114 NDCG@10, missing its 1.54B teacher model by only 0.0001 points. It also tracked within 0.008 on the NanoBEIR benchmark.

Tooling and Ecosystem Integration

Dropping new models into a retrieval pipeline requires compatible infrastructure alongside standard embedding models. Hugging Face paired this release with a new train-sentence-transformers Agent Skill built for Sentence Transformers v5.5.0.

This integration exposes a direct fine-tuning path for teams using autonomous coding agents like Claude Code or Cursor. Developers can now instruct their agent workspaces to automatically configure and execute fine-tuning runs on their own custom data.

If you maintain a document retrieval pipeline, benchmark the 17M or 32M models against your current cross-encoder. The speed advantage alone reduces latency at the final RAG stage, while the 8K context limit eliminates the need for complex chunking logic before the reranking step.

8K Context Reranking Hits Hugging Face With Ettin Cross-Encoders

Scaling Efficiency Across Parameters

Architecture and Distillation Techniques

Tooling and Ecosystem Integration

Keep Reading

How to Expose Ephemeral vLLM Endpoints on Hugging Face Jobs

32K Context Hits IBM's Open Multilingual Embedding R2 Models

World Models and DAgger Integration Ship in LeRobot v0.6.0

229,000 Standardized Benchmark Results Hit Hugging Face Models

How to Use Multimodal Sentence Transformers v5.4