Back to news
NewsMay 19, 2026· 2 min read

PaddleOCR 3.5 Adds Transformers Backend for Document AI Pipelines

PaddleOCR 3.5 now runs OCR and document parsing models using Hugging Face Transformers as an inference backend. Set engine="transformers" to integrate with existing PyTorch stacks and RAG workflows.

Our Take

This is plumbing, not progress: PaddleOCR's models stay the same, but developers in Transformers-heavy shops now avoid a second inference runtime.

Why it matters

Document ingestion is the hard part of RAG and Document AI pipelines. Making it cheaper to wire into existing Transformers deployments removes friction for teams already committed to that ecosystem.

Do this week

Document AI teams: test the Transformers backend on a staging PDF batch before month-end so you can decide whether to deprecate your Paddle static graph runtime.

PaddleOCR adds a Transformers inference option

PaddleOCR 3.5 introduces a pluggable inference backend architecture. Supported models (PP-OCRv5, PaddleOCR-VL 1.5) can now run with Hugging Face Transformers as the inference runtime by setting engine="transformers" in the Python API or CLI.

Previously, PaddleOCR models required either Paddle's static or dynamic graph engines. The new interface abstracts this choice behind an engine parameter. Developers configure backend-specific options like dtype, device placement, and attention implementation through engine_config. PaddleOCR still manages the OCR and document parsing pipelines; the backend is just the runtime layer.

The integration is available now. A live demo runs on Hugging Face Spaces. Installation requires PaddleOCR 3.5.0, PaddleX 3.5.2, Transformers 5.4.0 or later, and PyTorch for your target hardware.

Integration friction drops for PyTorch-first shops

Document ingestion is the bottleneck in RAG, Document AI, and agent pipelines. Weak OCR or parsing sends corrupted or missing context downstream, breaking retrieval or fact accuracy. Teams have known this; the problem is operational.

Many teams standardize on PyTorch and Transformers for model loading, training, and deployment. Adding PaddleOCR meant importing a second inference runtime and managing separate model artifact paths. The Transformers backend eliminates that dual-runtime penalty for teams already serving Transformers models in production.

This is not a performance play. PaddleOCR's default Paddle static graph backend is still the throughput choice. The Transformers option trades maximum OCR speed for operational simplicity: one dependency tree, one Hub-compatible model discovery surface, one inference orchestration path.

Decide your backend based on your constraint

Use the Transformers backend if your team already owns PyTorch/Transformers infrastructure and integration cost is your limiting factor. You gain familiar APIs and Hub-native model distribution.

Stick with the Paddle static graph backend if OCR or document parsing throughput is the hard constraint and you have spare operational budget.

The release is explicitly not a consolidation. PaddleOCR continues to expose both backends. The win is optionality: pick the inference runtime that matches your stack, not the other way around.

Start by testing on a non-critical document batch (invoices, contracts, charts) in your environment. Measure cold-start latency, per-document cost, and memory footprint under your actual page volumes and hardware. Then decide whether to migrate or stay.

#Open Source#Developer Tools#RAG#Computer Vision
Share:
Keep reading

Related stories