How to Stop OCR Degeneration With DharmaOCR Lite 3B

Dharma-AI’s new DharmaOCR models apply Direct Preference Optimization (DPO) to eliminate autoregressive text looping in specialized Vision-Language Models. While the industry primarily relies on DPO to align chatbot dialogue for helpfulness and safety, Dharma-AI leverages the technique to push models away from repetitive failure geometries that plague large-scale document extraction. You can now deploy the 7B and 3B parameter models to handle complex OCR workloads without incurring the compute bloat of infinite text loops. This guide covers the architectural pipeline, deployment requirements, schema configuration, and production constraints for the DharmaOCR family.

The Two-Stage Training Pipeline

Autoregressive degeneration is a critical production failure mode for Vision-Language Models processing dense documents. When a model falls into a repetitive text loop, it destroys the extraction quality while drastically inflating response times and compute costs. Traditional supervised training often struggles to unlearn these failure states.

The DharmaOCR models solve this through a two-stage training pipeline. First, Supervised Fine-Tuning (SFT) forces the model to adhere strictly to a target JSON schema. Second, the DPO phase penalizes the specific failure modes. During this phase, preference pairs are fed to the model where the “rejected” example is a degenerate, repetitive generation produced by the model itself in earlier checkpoints. The “chosen” example is a healthy, accurate extraction.

This application of DPO directly reduces the text degeneration rate by up to 87.6% relative to versions that only undergo standard fine-tuning. In active production environments, the 3B model maintains a degeneration rate of just 0.20%, ensuring stable inference hardware utilization. If your current pipeline struggles with runaway tokens, this approach offers a structural fix. You can review similar output constraint strategies in our guide on Why AI Hallucinates and How to Reduce It.

Model Selection and Hardware Requirements

Dharma-AI provides two model tiers optimized through this pipeline. You should select the model based on your memory constraints and accuracy requirements.

Model	Parameters	Extraction Score	Target Use Case
DharmaOCR Full	7B	0.925	Maximum accuracy for complex unstructured layouts
DharmaOCR Lite	3B	0.921	High-throughput, cost-sensitive production pipelines

Both models support AWQ quantization out of the box. Enabling AWQ reduces your per-page inference costs by approximately 22% and lowers the VRAM requirements for local deployment. If you are new to quantized deployments, read our breakdown on What Is Quantization in AI?.

The 3B model is particularly aggressive in its cost optimization. When fully optimized with AWQ, it operates up to 52x cheaper than equivalent frontier APIs.

Benchmark Performance

Both models are evaluated against the DharmaOCR-Benchmark, a comprehensive dataset designed to test extraction across printed forms, handwritten notes, and dense legal documents.

Despite its small footprint, the DharmaOCR Lite 3B model outperforms GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro on this specific benchmark. This performance delta highlights that strict domain alignment and targeted DPO can allow a 3B parameter model to surpass general-purpose models 100x its size. The models are heavily optimized for structured Brazilian Portuguese OCR, meaning performance characteristics will shift if you deploy them against different languages or character sets.

Schema Configuration

The models do not generate freeform markdown or raw text blocks. They are explicitly trained to output structured data. Your downstream application logic must be prepared to parse a strict JSON schema.

The SFT stage enforces four primary keys for every extraction event:

header: Captures document titles, letterheads, and top-level metadata.
text: Contains the primary body content of the document.
footer: Extracts page numbers, footnotes, and bottom-aligned metadata.
margin: Captures side-notes, annotations, or stamps located outside the main text block.

Your ingestion pipeline must validate these keys. If your existing database schema relies on flat text, you will need an intermediate mapping function to concatenate these fields or adjust your database to store the structured object. For more on handling structured outputs, review Structured Output from LLMs: JSON Mode Explained.

Licensing and Production Deployment

The DharmaOCR models and the associated DharmaOCR-Benchmark are available on Hugging Face. The model artifacts are released under a source-available, non-commercial license. You can download and evaluate the models locally for research and testing immediately.

If you plan to integrate these models into a for-profit application or an internal production pipeline that generates revenue, you must negotiate a separate commercial agreement with Dharma-AI.

Ensure your evaluation environment maps directly to your target production documents. Run a test batch of your own printed and handwritten documents through the 3B model to verify the 0.20% degeneration rate holds for your specific document layouts before scaling the deployment.

How to Stop OCR Degeneration With DharmaOCR Lite 3B

The Two-Stage Training Pipeline

Model Selection and Hardware Requirements

Benchmark Performance

Schema Configuration

Licensing and Production Deployment

Keep Reading

Runway Media Router Automates Generation Across Gen-4 and Veo

4B DharmaOCR Beats Mistral OCR4 in Brazilian Portuguese OCR

Multilingual PP-OCRv6 Beats GPT-5.5 on Industrial Text

Meta AI Mode Grounds Search in Social Data via Llama 4

How to Cut Checkpoint Time by 85% With TRL Delta Weight Sync