How to Stop OCR Degeneration With DharmaOCR Lite 3B
Dharma-AI's new DharmaOCR models apply DPO to eliminate autoregressive looping. Learn how to configure the 3B parameter model for structured JSON extraction.
Dharma-AI’s new DharmaOCR models apply Direct Preference Optimization (DPO) to eliminate autoregressive text looping in specialized Vision-Language Models. While the industry primarily relies on DPO to align chatbot dialogue for helpfulness and safety, Dharma-AI leverages the technique to push models away from repetitive failure geometries that plague large-scale document extraction. You can now deploy the 7B and 3B parameter models to handle complex OCR workloads without incurring the compute bloat of infinite text loops. This guide covers the architectural pipeline, deployment requirements, schema configuration, and production constraints for the DharmaOCR family.
The Two-Stage Training Pipeline
Autoregressive degeneration is a critical production failure mode for Vision-Language Models processing dense documents. When a model falls into a repetitive text loop, it destroys the extraction quality while drastically inflating response times and compute costs. Traditional supervised training often struggles to unlearn these failure states.
The DharmaOCR models solve this through a two-stage training pipeline. First, Supervised Fine-Tuning (SFT) forces the model to adhere strictly to a target JSON schema. Second, the DPO phase penalizes the specific failure modes. During this phase, preference pairs are fed to the model where the “rejected” example is a degenerate, repetitive generation produced by the model itself in earlier checkpoints. The “chosen” example is a healthy, accurate extraction.
This application of DPO directly reduces the text degeneration rate by up to 87.6% relative to versions that only undergo standard fine-tuning. In active production environments, the 3B model maintains a degeneration rate of just 0.20%, ensuring stable inference hardware utilization. If your current pipeline struggles with runaway tokens, this approach offers a structural fix. You can review similar output constraint strategies in our guide on Why AI Hallucinates and How to Reduce It.
Model Selection and Hardware Requirements
Dharma-AI provides two model tiers optimized through this pipeline. You should select the model based on your memory constraints and accuracy requirements.
| Model | Parameters | Extraction Score | Target Use Case |
|---|---|---|---|
| DharmaOCR Full | 7B | 0.925 | Maximum accuracy for complex unstructured layouts |
| DharmaOCR Lite | 3B | 0.921 | High-throughput, cost-sensitive production pipelines |
Both models support AWQ quantization out of the box. Enabling AWQ reduces your per-page inference costs by approximately 22% and lowers the VRAM requirements for local deployment. If you are new to quantized deployments, read our breakdown on What Is Quantization in AI?.
The 3B model is particularly aggressive in its cost optimization. When fully optimized with AWQ, it operates up to 52x cheaper than equivalent frontier APIs.
Benchmark Performance
Both models are evaluated against the DharmaOCR-Benchmark, a comprehensive dataset designed to test extraction across printed forms, handwritten notes, and dense legal documents.
Despite its small footprint, the DharmaOCR Lite 3B model outperforms GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro on this specific benchmark. This performance delta highlights that strict domain alignment and targeted DPO can allow a 3B parameter model to surpass general-purpose models 100x its size. The models are heavily optimized for structured Brazilian Portuguese OCR, meaning performance characteristics will shift if you deploy them against different languages or character sets.
Schema Configuration
The models do not generate freeform markdown or raw text blocks. They are explicitly trained to output structured data. Your downstream application logic must be prepared to parse a strict JSON schema.
The SFT stage enforces four primary keys for every extraction event:
header: Captures document titles, letterheads, and top-level metadata.text: Contains the primary body content of the document.footer: Extracts page numbers, footnotes, and bottom-aligned metadata.margin: Captures side-notes, annotations, or stamps located outside the main text block.
Your ingestion pipeline must validate these keys. If your existing database schema relies on flat text, you will need an intermediate mapping function to concatenate these fields or adjust your database to store the structured object. For more on handling structured outputs, review Structured Output from LLMs: JSON Mode Explained.
Licensing and Production Deployment
The DharmaOCR models and the associated DharmaOCR-Benchmark are available on Hugging Face. The model artifacts are released under a source-available, non-commercial license. You can download and evaluate the models locally for research and testing immediately.
If you plan to integrate these models into a for-profit application or an internal production pipeline that generates revenue, you must negotiate a separate commercial agreement with Dharma-AI.
Ensure your evaluation environment maps directly to your target production documents. Run a test batch of your own printed and handwritten documents through the 3B model to verify the 0.20% degeneration rate holds for your specific document layouts before scaling the deployment.
Get Insanely Good at AI
The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.
Keep Reading
DharmaOCR 7B Proves Domain Alignment Beats Parameter Scaling
Dharma-AI has released two specialized OCR models, demonstrating that targeted training history outpaces general-purpose frontier models on structured tasks.
How to Cut Checkpoint Time by 85% With TRL Delta Weight Sync
Learn how to configure TRL Delta Weight Sync to reduce trillion-parameter model checkpointing times by 85 percent using Hugging Face Hub Buckets.
NVIDIA Nemotron-Labs-Diffusion Yields 6x TPF Over Qwen3-8B
NVIDIA has released the Nemotron-Labs-Diffusion model family, introducing a joint autoregressive and diffusion training objective to accelerate text generation.
PaddleOCR 3.5 Adds Transformers Backend and Browser Inference
The PaddleOCR 3.5 update decouples the toolkit from the PaddlePaddle framework by adding a native Transformers backend and client-side browser execution.
Gemini 3.1 Flash-Lite Ships 1M Context at $0.25 Per Million
Google's lowest-latency Gemini model is now generally available, introducing variable thinking levels and a 1M token context window for high-volume routing.