PaddleOCR 3.5 Adds Transformers Backend and Browser Inference
The PaddleOCR 3.5 update decouples the toolkit from the PaddlePaddle framework by adding a native Transformers backend and client-side browser execution.
The structural shift detailed in the PaddleOCR 3.5 technical release decouples the popular optical character recognition toolkit from its native framework. With 20 major models now supporting a native Transformers backend, developers can run inference directly within Hugging Face environments without maintaining a separate PaddlePaddle dependency. The update reframes the tool as a backend-agnostic ingestion layer for modern AI data pipelines.
Architectural Flexibility
Developers can now switch between the PaddlePaddle static graph, PaddlePaddle dynamic graph, and Transformers by specifying the engine parameter in the API. The new architecture passes backend-specific configurations through an engine_config object. For the Transformers backend, this configuration handles dtype settings to support FP16, BF16, or INT8 quantization.
The implementation utilizes Flash-Attention to optimize memory usage and processing speed during dense document extraction. Device placement handles flexible assignment to CPUs, NVIDIA CUDA hardware, and specialized accelerators including Muxi GPUs and Intel Arc GPUs that were introduced in recent point releases.
Model Upgrades and Language Support
PaddleOCR 3.5 introduces the PP-OCRv5 Multilingual Recognition Model, broadening robust text extraction to 109 languages. The new language systems add native support for Cyrillic, Arabic, Devanagari, Telugu, and Tamil scripts. Despite maintaining a highly compact footprint of just 2M parameters, the model demonstrates accuracy increases exceeding 40% for specific languages compared to the previous v4 generation.
Unblocking RAG Data Ingestion
Irregular layouts, charts, and tables remain a primary bottleneck for large language models processing raw documents. The release targets this friction directly with a one-click Document-to-Markdown conversion feature. Developers can pipe common proprietary formats, including Word (.docx), Excel (.xlsx), and PowerPoint (.pptx), straight into Markdown.
This pipeline streamlines data ingestion for teams building Retrieval-Augmented Generation systems. Additionally, parsing results from the PaddleOCR-VL series, PP-StructureV3, and PP-DocTranslation can now export directly to .docx format to support downstream editing and human review workflows.
Client-Side Execution
Client-side text extraction expands with the official introduction of PaddleOCR.js. This browser-based inference SDK runs the PP-OCRv5 model entirely on the client using WebGPU and Wasm acceleration. By executing AI inference locally in the browser, the framework ensures sensitive document data never transmits to external servers or third-party APIs.
If you build AI ingestion pipelines reliant on the Hugging Face ecosystem, migrating to the Transformers backend will allow you to consolidate your inference stack. Teams processing highly sensitive user data should evaluate the WebGPU capabilities of PaddleOCR.js to shift extraction workloads entirely to the client side.
Get Insanely Good at AI
The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.
Keep Reading
How to Fine-Tune Cosmos Predict 2.5 for Robotics With LoRA
Learn how to adapt NVIDIA's 2B and 14B Cosmos Predict 2.5 world foundation models using parameter-efficient fine-tuning methods like LoRA and DoRA.
8K Context Reranking Hits Hugging Face With Ettin Cross-Encoders
Hugging Face released six open-source cross-encoders under the Ettin Reranker family with an 8,192-token context window for long-form document retrieval.
OlmoEarth v1.1 Tops DINOv3 in Remote Sensing Benchmarks
Ai2 updated its multimodal Earth observation models with OlmoEarth v1.1, bringing enhanced training efficiency and state-of-the-art benchmark performance.
32K Context Hits IBM's Open Multilingual Embedding R2 Models
IBM released Granite Embedding Multilingual R2, upgrading its Apache 2.0 encoder models with a 32,768-token context window and ModernBERT architecture.
Private Evaluation Track Deters Open ASR Benchmaxxing
Hugging Face partnered with Appen and DataoceanAI to introduce a private evaluation track to the Open ASR Leaderboard, mitigating test-set contamination.