Hugging Face launches weekly release of huggingface_hub

2 hours ago5 minutes

Hugging Face transforms its release process: now huggingface_hub is published weekly using open tools, an open-weight model and a human-in-the-loop for the final decision. Why should you care? Because your dependencies — transformers, datasets, diffusers, sentence-transformers and dozens more — talk to the Hub through this Python client, and every week without a release is a week of postponed fixes and features.

What they did and why it matters

Before: releases every 4 to 6 weeks, with automated mechanical steps but lots of manual work: create a branch, change the version in __init__.py, write notes, cut the release, announce it. That took about half a day per release, scattered and repetitive.

Now: everything lives in a single GitHub Actions workflow (.github/workflows/release.yml) that you trigger manually and that runs the whole chain — preparation, publishing to PyPI, changelog generation, creating downstream branches, announcing in Slack, archiving the AI draft and the final version, and automatic comments on PRs. Result: a weekly cadence, lower latency for changes, and shorter contribution loops.

Architecture and full stack

They built it with one clear principle: only things any maintainer can run. No closed models, no proprietary platforms.

Orchestrator: GitHub Actions
Agent runtime: OpenCode (version pinned and verified by SHA256)
Generation model: open weights (currently Z.ai's GLM-5.2) served by HF Inference Providers
PyPI publishing: Trusted Publishing with OIDC and Sigstore/PEP 740
Storage: Hugging Face buckets to audit drafts

Key design: the model generates, the code verifies, and the human decides. That trinity makes the process fast and reliable.

How the flow works, step by step

Manual trigger with a workflow_dispatch that accepts release_type (minor-prerelease, minor-release, patch-release).
Job Prepare: calculate version, create or reuse a branch, bump __version__, tag and push.
Publish to PyPI: build and upload the huggingface_hub package and the hf CLI as a separate package.
Release notes: diff since the last tag, gather PR metadata via the GitHub API and ask the model for a changelog draft. Saved as a release draft.
Downstream branches: open branches in transformers, datasets, diffusers, sentence-transformers with the RC pinned so their CI validates integrations.
Slack: the model proposes the internal announcement; a human reviews it.
Archiving: upload both the raw AI draft and the human-edited version to a bucket for traceability.
Post-release: PR to bump main to nextdev0, comments on PRs indicating which release shipped each change, sync of CLI docs and Slack reports threaded for each step.

Deterministic validation: the idea that makes the AI trustworthy

The biggest fear with AI-generated notes is that the model will omit PRs or invent changes. The solution is simple and elegant: build a deterministic manifest of PRs and verify that what the model produces matches that manifest exactly.

Extract PR numbers from squash-merge commits with a regex:

PR_NUMBER_PATTERN = re.compile(r"\(#(\d+)\)\$")
pr_numbers = [
    int(m.group(1))
    for commit in commits_since_last_tag
    if (m := PR_NUMBER_PATTERN.search(commit.title))
]
save_manifest(pr_numbers)

The model generates notes from that input. Then you validate:

expected = set(load_manifest())
found = extract_pr_refs(notes_md)  # converts "#1234" -> 1234
missing = expected - found
extra = found - expected

If there are discrepancies you iterate with the agent asking for targeted corrections, until there are no missing or extra PRs, or until a maximum number of iterations.

This pattern mixes the best of AI for drafting and the best of deterministic code to guarantee exhaustiveness.

Preventing hallucinations: real context for the model

If the AI summarizes a PR using only the title, it can invent examples or APIs. To avoid that, the workflow includes documentation diffs from each PR in the prompt: any .md under docs/ that the PR touched is added as context so the model can cite real examples.

def fetch_doc_diffs(pr):
    return [
        {"filename": f.filename, "status": f.status, "patch": f.patch}
        for f in pr.get_files()
        if f.filename.startswith("docs/") and f.filename.endswith(".md") and f.patch
    ]

Also, prompts are versioned as Skills — small SKILL.md files with templates and tone rules. That makes the voice reproducible and adjustable.

Security and supply-chain

No long-lived PyPI tokens: they use OIDC short-lived tokens minted by GitHub Actions and Trusted Publishing.
They generate PEP 740 attestations and Sigstore evidence for each artifact.
The agent runtime is pinned and verified by SHA256 before running — no unchecked curl | bash.

Example permission block in the workflow:

permissions:
  id-token: write
  attestations: write

And the publish action uses attestations: true — no passwords, no persistent API tokens.

Cost, results and practical lessons

Cost: almost zero. A full release costs about $0.25 on HF Inference Providers when using open weights billed pay-as-you-go.
Cadence: from 4–6 weeks to every week.
Observable benefits:
- Better, more consistent notes: the AI delivers the first draft, the human polishes it in 15 minutes.
- Faster detection of breakages: downstream test branches catch integration issues during the RC window.
- Clearer contributor feedback: the automatic comment "this shipped in vX.Y.Z" reduces confusion.

How to adapt this to your project (practical)

If you maintain a Python library you can reuse almost everything:

Fork release.yml and the associated scripts.
Change paths and package names, and the list of downstream repos if they don't apply to you.
Rewrite the SKILL.md files so the tone and structure match yours.
Pin two repo variables: MODEL_ID and OPENCODE_VERSION.
Configure Trusted Publishing if you want OIDC on PyPI; otherwise adapt to your publishing process.
If you don't have downstreams, remove that job.

The most valuable piece to port is the trust-but-verify loop: deterministic manifest — AI draft — validation — re-prompt. That protects against omissions and fabrications.

Future improvement paths

Hugging Face is already thinking about automating downstream failure triage by reading logs and reporting them in Slack, and applying this pattern to other libraries in the ecosystem. The idea is to scale the flow without losing human guarantees where they matter.

The practical lesson is clear: it's not about "letting AI do everything", it's about having the AI draft, checking with code, and letting a person decide. That turns hours of manual work into minutes of review while keeping technical trust and traceability.

Original source

https://huggingface.co/blog/huggingface-hub-release-ci

Stay up to date!

Get AI news, tool launches, and innovative products straight to your inbox. Everything clear and useful.