Bounding Boxes Arrive in Mistral OCR 4 for Agentic Retrieval

Mistral AI released OCR 4 on June 23, 2026, shifting its document intelligence engine from simple text extraction to structured document understanding. The mistral-ocr-4-0 model introduces native paragraph-level bounding boxes and block classification, targeting the ingestion requirements of enterprise agentic search pipelines.

Structured Document Mapping

The primary technical shift in OCR 4 is the transition from a flat text stream to a structured document map. The model classifies content into 13 structural labels, including text, title, list, table, image, equation, and code. This block classification allows downstream models to process tabular data or code snippets with the correct formatting context.

Native paragraph-level bounding box extraction allows systems to localize text on the original document. This enables front-end applications to render in-context highlighting and exact visual citations when retrieving information. Mistral also added inline confidence scores at both the page and word levels. This granular scoring allows automated pipelines to flag low-confidence extractions for human verification.

The model processes images alongside standard formats like PDF, DOC, PPT, and OpenDocument. Multilingual capabilities cover 170 languages across 10 language groups, with specific performance improvements noted for low-resource languages.

Throughput and Benchmark Performance

Mistral reports that independent annotators preferred OCR 4 over competing systems with a 72% average win rate. The model processes up to 2,000 pages per minute when deployed on a single GPU.

Benchmark	Score
OmniDocBench	93.07
OlmOCRBench	85.20

Deployment Channels and Integration

OCR 4 is available immediately through the Mistral API and Studio. Cloud deployments launched concurrently on Microsoft Foundry and Amazon SageMaker, with Snowflake support pending. Enterprise users requiring data residency can deploy the model as a self-hosted single container.

The model serves as the default ingestion component for the newly announced Mistral Search Toolkit, an open-source framework for composable search. Third-party platforms are already adopting the standard; the open-source platform Sparrow integrated OCR 4 as a cloud backend on launch day to convert documents into structured JSON. Mistral also announced that Mistral Medium 3.5, arriving June 24, is specifically tuned to reason over the structured data extracted by OCR 4. Developers who deploy Mistral Small 4 for multimodal reasoning can adapt similar architectures to pair OCR extraction with capable language models.

API Pricing Tiers

Pricing scales based on the level of processing required. Using the Batch-API discount halves the base extraction cost for asynchronous workloads.

Service Level	Cost per 1,000 Pages
Raw Extraction (Batch)	$2.00
Raw Extraction (Standard)	$4.00
Annotated Document AI	$5.00

If you build a RAG application, updating your ingestion pipeline to capture bounding box coordinates changes how you present data to end users. Storing these coordinates alongside your text chunks allows your front-end to render a direct visual overlay on the source document instead of just quoting the extracted string.

Bounding Boxes Arrive in Mistral OCR 4 for Agentic Retrieval

Structured Document Mapping

Throughput and Benchmark Performance

Deployment Channels and Integration

API Pricing Tiers

Keep Reading

Build a Fast Multilingual OCR with Nemotron-OCR-v2

Volvo EX60 Routes External Camera Feeds to Gemini AI

Ai2's 4B MolmoMotion Maps Text Instructions to 3D Trajectories

IBM Releases Granite 4.0 3B Vision for Document Parsing and Chart Extraction

450ms Latency Desktop Automation Hits Gemini 3.5 Flash