PaddleOCR 3.5 arrives to give you more flexibility when turning documents into useful data. What changes? Now you can run OCR and document-parsing models provided by PaddleOCR using transformers as an inference backend, which makes it easier to plug them into stacks centered on Hugging Face and PyTorch.
What PaddleOCR 3.5 brings
The main novelty is a more flexible inference interface: the engine parameter lets you pick the backend and engine_config accepts backend-specific options. In practice this means:
PaddleOCR still manages the internal OCR and document-parsing pipelines, so you don’t have to call every component manually.
transformers becomes a supported backend to run compatible PaddleOCR models.
You can configure options like dtype, device placement and attention implementation through engine_config.
A simple way to understand the stack:
Layer
What it means
Examples
Application layer
Applications that consume OCR and document parsing
RAG, agents, Document AI
Model layer
OCR and parsing capabilities
PP-OCRv5, PaddleOCR-VL 1.5
Inference backend layer
Runtime to execute the models
paddle_static, paddle_dynamic, transformers
This release focuses on the backend layer: PaddleOCR keeps the OCR and parsing capabilities, and transformers offers a natural alternative for Hugging Face–centric environments.
Practical examples (installation and usage)
Install PaddleOCR 3.5 along with PaddleX, Transformers and a PyTorch build compatible with your hardware. On CUDA 12.6, for example:
The optimal configuration depends on the model, the hardware and the deployment environment.
When to use the Transformers backend and when not?
Use the transformers backend when you want OCR and parsing capabilities to fit naturally into a Hugging Face / PyTorch–centric stack. When does that make sense? If you already use:
Pipelines, tools and deployments based on transformers.
Model discovery and distribution through the Hub.
PyTorch infrastructure for experimentation and artifact management.
If your priority is squeezing out maximum throughput and lowest latency in production, the default paddle_static backend is usually the recommended option. This integration doesn’t replace backends: it gives you the freedom to choose what fits your needs.
Technical recommendations and best practices
Try several combinations of dtype and attn_implementation to find the best balance between accuracy, memory and speed on your hardware.
Validate the document ingestion pipeline (tables, formulas, complex layouts) before integrating it with LLMs. Bad preprocessing will ruin any RAG or agent, no matter how good the LLM is.
If you already have Hugging Face infrastructure (Spaces, Hub, Transformers Serving), the integration reduces friction and makes model and artifact management easier.
For production deployments, measure throughput, latency and memory usage on both backends (transformers vs paddle_static) before deciding.
Try the demo on Hugging Face Spaces to see how it behaves in real scenarios:
If you work on RAG, document agents, search or analytics, PaddleOCR 3.5 makes the critical step of turning documents into structured data easier inside a Transformers-based flow. Is it magic? No — it’s a shortcut to integrate mature OCR capabilities with the infrastructure many teams already use.
Think about it this way: the hardest part in Document AI is usually before the LLM. PaddleOCR speeds up that first stage and lets you focus on retrieval, reasoning and action.