Hugging Face launches a Skill that converts Transformers to MLX

Apr 16, 2026Keryc Díaz4 minutes

In 2026 code automation is no longer just autocomplete: agents can take a short specification and generate complete solutions. Cool? Yes. Problems? Also.

Hugging Face introduced a Skill and a test harness to convert transformers models to mlx-lm, with the ambition that a model reaches MLX almost as soon as it appears in Transformers.

What they did and why

The core idea is simple: when a model lands in transformers, it should be available in mlx-lm shortly after. To make that happen they created a Skill that guides an agent to read the implementation in transformers, write the MLX version, run tests and produce a PR ready for review.

Why is this necessary? Because agents can already open PRs, but they often fail to understand the implicit conventions of large projects. transformers is a repo designed to be human-readable: files structured to be read top-to-bottom, flat hierarchies, and design decisions that aren’t always documented.

Agents, on the other hand, lack that context. They tend to refactor, generalize prematurely, introduce changes that break implicit contracts, and make subtle performance or numeric mistakes. The result: a flood of PRs and the same maintainers to review them.

How the Skill works (technical)

The Skill is a set of instructions (a recipe for agents) that automates the porting flow without pretending to replace the human. Given a prompt like "convert the olmo_hybrid architecture to MLX", the Skill:

Creates a virtual environment and prepares editable installs of mlx-lm and transformers.
Discovers and downloads relevant variants from the Hub using the hf CLI.
Reads transformers code and generates the implementation in MLX.
Runs automated tests: numerical comparisons, generation examples, dtype checks from safetensors headers, and layer-by-layer comparisons to locate divergences.

Watch out for details that usually fail: RoPE misconfigurations that degrade on long sequences, contamination to float32 that kills inference speed, config fields that vary across variants, and handling distributed inference for giant models.

The Skill doesn’t declare success until a battery of checks passes satisfactorily. It also generates a thorough report for the PR: summary of variants, architectural differences, generation examples and logs with raw JSON data.

For contributors and reviewers

The Skill is designed both for those who want to contribute and for those who review. For the contributor it automates the heavy lifting: discovering checkpoints, diffing configs, inferring dtype, and running layer-wise tests. For the reviewer it produces an honest PR: it declares agent assistance, follows mlx-lm conventions (idiomatic, no unnecessary refactors) and attaches much more signal than the average PR.

A couple of cultural rules included in the Skill are key: don’t use comments to explain code instead of writing clear code; don’t propose global refactors; don’t touch shared utilities without approval. Small constraints that save reviewers hours of work.

Important: the Skill is not a shortcut to automatic acceptance. The typical cycle remains human-to-human: PR, review, iterate. If you’re not willing to participate in that cycle, don’t open the PR just because an agent made it for you.

The non-agent test harness

To avoid relying on the LLM’s word, they created a separate, non-agent test harness. Benefits:

Reduces uncertainty from hallucinations or LLM complacency.
Ensures reproducibility: anyone can clone the harness repo and run the tests.
Provides transparency: reports, per-model details and input/output dumps in JSON are stored.

The tests aren’t an automatic gate. Many checks are simple and quantitative (is the dtype correct?), but others are qualitative and require human judgment (is a 4% difference in logits acceptable?). The harness provides evidence; the final decision remains human.

How to try it now

If you want to experiment with the Skill on your machine:

Run: uv run https://raw.githubusercontent.com/huggingface/transformers-to-mlx/main/install_skill.py
Then add the Skill with: uvx hf skills add --claude

The developers used Claude Code to exercise the Skill; the approach works with other code agents like Codex, although they haven’t tested exhaustively. If you try it with another agent, they ask for feedback.

Limitations and future directions

The Skill already works well for LLMs in mlx-lm, but there are open areas:

mlx-vlm: vision-language models live in another repo with different conventions and need processors to preprocess images.
llama.cpp: the port requires moving processors to C++ and accepting inevitable numeric differences.
Expanding the test battery and exploring automating its execution in Hugging Face infrastructure.
Support for uploading quantized models: the Skill tests quantization but doesn’t perform the upload during review.
Tests specific to "thinking" models: there aren’t yet tests that validate reasoning structure.

Final reflection

The bottleneck in open source isn’t writing code fast: it’s understanding a codebase and changing it without breaking contracts with users. Agents can speed up the work if we teach them what matters. This Skill is a pragmatic example: it automates repetitive tasks and gathers evidence to help reviewers, but it respects that the final decision should be human.

If you want to learn by porting locally, the Skill is an excellent educational tool: point the Skill to your fork of mlx-lm, convert models, and compare your output to the accepted implementation when it lands in the official repo. Do this a few times and you’ll learn a lot about transformers, MLX and language architectures.

Fuente original

https://huggingface.co/blog/transformers-to-mlx

Stay up to date!

Get AI news, tool launches, and innovative products straight to your inbox. Everything clear and useful.