In 2025 it's no longer science fiction to run state-of-the-art models on your phone. Can you imagine scanning complex documents without sending anything to the cloud and without paying per-page API fees? That's exactly what the team shows by converting dots.ocr
to run on-device with Core ML and MLX.
What is dots.ocr and why it matters
Dots.OCR is a competitive OCR model with 3 billion parameters developed by RedNote, designed for text recognition in complex documents. In public tests it outperforms large models like Gemini 2.5 Pro on the OmniDocBench benchmark, which makes it an interesting candidate to bring SOTA OCR to the device. (huggingface.co)
Running models on the device has clear benefits: you don't expose API keys, you don't depend on the connection, and you eliminate per-use costs. Plus, Apple offers dedicated acceleration on its devices with the Neural Engine, which in the article's tests proved far more energy-efficient than CPU or GPU — critical for mobile apps. (huggingface.co)
How they got it running on iPhone: key steps
Converting from PyTorch to a format iOS understands is a two-step process: first capture the execution graph (for example with torch.jit.trace
or torch.export
) and then compile that graph into a .mlpackage
using coremltools
. That's the general path they followed, with an iterative approach: get it working on GPU and Float32 first, then optimize for the Neural Engine. (huggingface.co)
In practice they used Core ML for the model's visual part and MLX to run the language backbone. They also prepared a small "harness" that lets you switch compute_units
and precision, which makes it easy to try different combinations without reconverting every time. If you want to explore the repo, the team published the code and a pre-converted package for anyone who wants to try it directly. (huggingface.co)
Real problems and concrete solutions
Big model conversions rarely work on the first try. They hit typical ML compiler errors: dtype
mismatches in matmul
caused by a torch.arange
that Core ML interpreted as int32, issues with repeat_interleave
, and in-place operations that don't support dynamic indices. The fixes were simple and practical: cast outputs, remove logic built for batches or video when only a single image is processed, and replace boolean masks with float
masks when the Neural Engine doesn't support bool
. (huggingface.co)
A repeated principle was to start with the minimal model that does the essentials and strip tricks not needed for the on-device version. That reduces the surface for bugs and speeds up conversion. They also switched attention implementations to the sdpa
variant to better align with what Core ML supports. (huggingface.co)
Performance, size and next steps
After the initial conversion the good news was functionality and accuracy very close to the original PyTorch model. The bad news was size: the Float32 version ended up over 5 GB, and the visual encoder's forward pass took over a second in some measurements — not acceptable for many mobile apps. That's why parts two and three of the series focus on integrating Core ML with MLX and on optimizations like quantization and handling dynamic shapes to take advantage of the Neural Engine. (huggingface.co)
Running SOTA models on-device is possible, but it requires discipline to simplify the model, patience to debug the conversion, and compression techniques to make the result practical.
What this means for developers and companies
If you're a mobile developer, this opens the door to offering powerful OCR without sending sensitive data to the cloud. For product teams, it means you can rethink flows where latency, privacy and cost are critical. Yes, the engineering work is real, but the article lays out the path: understand the architecture, reduce the model to essentials and apply iterative conversion and optimization steps. (huggingface.co)
In short, the conversion of dots.ocr
shows the barrier to bringing advanced AI to your pocket is shrinking. There are still challenges around size and performance, but the mix of Core ML, MLX and good engineering practices makes running SOTA OCR on iOS a real option for 2025. (huggingface.co)