PaddleOCR 3.5 Adds Transformers Backend and Browser Inference

The structural shift detailed in the PaddleOCR 3.5 technical release decouples the popular optical character recognition toolkit from its native framework. With 20 major models now supporting a native Transformers backend, developers can run inference directly within Hugging Face environments without maintaining a separate PaddlePaddle dependency. The update reframes the tool as a backend-agnostic ingestion layer for modern AI data pipelines.

Architectural Flexibility

Developers can now switch between the PaddlePaddle static graph, PaddlePaddle dynamic graph, and Transformers by specifying the engine parameter in the API. The new architecture passes backend-specific configurations through an engine_config object. For the Transformers backend, this configuration handles dtype settings to support FP16, BF16, or INT8 quantization.

The implementation utilizes Flash-Attention to optimize memory usage and processing speed during dense document extraction. Device placement handles flexible assignment to CPUs, NVIDIA CUDA hardware, and specialized accelerators including Muxi GPUs and Intel Arc GPUs that were introduced in recent point releases.

Model Upgrades and Language Support

PaddleOCR 3.5 introduces the PP-OCRv5 Multilingual Recognition Model, broadening robust text extraction to 109 languages. The new language systems add native support for Cyrillic, Arabic, Devanagari, Telugu, and Tamil scripts. Despite maintaining a highly compact footprint of just 2M parameters, the model demonstrates accuracy increases exceeding 40% for specific languages compared to the previous v4 generation.

Unblocking RAG Data Ingestion

Irregular layouts, charts, and tables remain a primary bottleneck for large language models processing raw documents. The release targets this friction directly with a one-click Document-to-Markdown conversion feature. Developers can pipe common proprietary formats, including Word (.docx), Excel (.xlsx), and PowerPoint (.pptx), straight into Markdown.

This pipeline streamlines data ingestion for teams building Retrieval-Augmented Generation systems. Additionally, parsing results from the PaddleOCR-VL series, PP-StructureV3, and PP-DocTranslation can now export directly to .docx format to support downstream editing and human review workflows.

Client-Side Execution

Client-side text extraction expands with the official introduction of PaddleOCR.js. This browser-based inference SDK runs the PP-OCRv5 model entirely on the client using WebGPU and Wasm acceleration. By executing AI inference locally in the browser, the framework ensures sensitive document data never transmits to external servers or third-party APIs.

If you build AI ingestion pipelines reliant on the Hugging Face ecosystem, migrating to the Transformers backend will allow you to consolidate your inference stack. Teams processing highly sensitive user data should evaluate the WebGPU capabilities of PaddleOCR.js to shift extraction workloads entirely to the client side.

PaddleOCR 3.5 Adds Transformers Backend and Browser Inference

Architectural Flexibility

Model Upgrades and Language Support

Unblocking RAG Data Ingestion

Client-Side Execution

Keep Reading

How to launch Hugging Face models in SageMaker Studio

Multilingual PP-OCRv6 Beats GPT-5.5 on Industrial Text

DharmaOCR 7B Proves Domain Alignment Beats Parameter Scaling

4B DharmaOCR Beats Mistral OCR4 in Brazilian Portuguese OCR

$1B Nebius Agreement Secures GB300 Chips for Reflection AI