Hugging Face transforms its release process: now huggingface_hub is published weekly using open tools, an open-weight model and a human-in-the-loop for the final decision. Why should you care? Because your dependencies — transformers, datasets, diffusers, sentence-transformers and dozens more — talk to the Hub through this Python client, and every week without a release is a week of postponed fixes and features.
What they did and why it matters
Before: releases every 4 to 6 weeks, with automated mechanical steps but lots of manual work: create a branch, change the version in __init__.py, write notes, cut the release, announce it. That took about half a day per release, scattered and repetitive.
Now: everything lives in a single GitHub Actions workflow (.github/workflows/release.yml) that you trigger manually and that runs the whole chain — preparation, publishing to PyPI, changelog generation, creating downstream branches, announcing in Slack, archiving the AI draft and the final version, and automatic comments on PRs. Result: a weekly cadence, lower latency for changes, and shorter contribution loops.
Architecture and full stack
They built it with one clear principle: only things any maintainer can run. No closed models, no proprietary platforms.
- Orchestrator: GitHub Actions
- Agent runtime: OpenCode (version pinned and verified by SHA256)
- Generation model: open weights (currently Z.ai's GLM-5.2) served by HF Inference Providers
- PyPI publishing: Trusted Publishing with OIDC and Sigstore/PEP 740
- Storage: Hugging Face buckets to audit drafts
Key design: the model generates, the code verifies, and the human decides. That trinity makes the process fast and reliable.
How the flow works, step by step
- Manual trigger with a
workflow_dispatchthat acceptsrelease_type(minor-prerelease, minor-release, patch-release). - Job Prepare: calculate version, create or reuse a branch, bump
__version__, tag and push. - Publish to PyPI: build and upload the
huggingface_hubpackage and thehfCLI as a separate package. - Release notes: diff since the last tag, gather PR metadata via the GitHub API and ask the model for a changelog draft. Saved as a release draft.
- Downstream branches: open branches in
transformers,datasets,diffusers,sentence-transformerswith the RC pinned so their CI validates integrations. - Slack: the model proposes the internal announcement; a human reviews it.
- Archiving: upload both the raw AI draft and the human-edited version to a bucket for traceability.
- Post-release: PR to bump
mainto nextdev0, comments on PRs indicating which release shipped each change, sync of CLI docs and Slack reports threaded for each step.
Deterministic validation: the idea that makes the AI trustworthy
The biggest fear with AI-generated notes is that the model will omit PRs or invent changes. The solution is simple and elegant: build a deterministic manifest of PRs and verify that what the model produces matches that manifest exactly.
- Extract PR numbers from squash-merge commits with a regex:
PR_NUMBER_PATTERN = re.compile(r"\(#(\d+)\)\$")
pr_numbers = [
int(m.group(1))
for commit in commits_since_last_tag
if (m := PR_NUMBER_PATTERN.search(commit.title))
]
save_manifest(pr_numbers)
- The model generates notes from that input. Then you validate:
expected = set(load_manifest())
found = extract_pr_refs(notes_md) # converts "#1234" -> 1234
missing = expected - found
extra = found - expected
- If there are discrepancies you iterate with the agent asking for targeted corrections, until there are no missing or extra PRs, or until a maximum number of iterations.
This pattern mixes the best of AI for drafting and the best of deterministic code to guarantee exhaustiveness.
Preventing hallucinations: real context for the model
If the AI summarizes a PR using only the title, it can invent examples or APIs. To avoid that, the workflow includes documentation diffs from each PR in the prompt: any .md under docs/ that the PR touched is added as context so the model can cite real examples.
def fetch_doc_diffs(pr):
return [
{"filename": f.filename, "status": f.status, "patch": f.patch}
for f in pr.get_files()
if f.filename.startswith("docs/") and f.filename.endswith(".md") and f.patch
]
Also, prompts are versioned as Skills — small SKILL.md files with templates and tone rules. That makes the voice reproducible and adjustable.
Security and supply-chain
- No long-lived PyPI tokens: they use OIDC short-lived tokens minted by GitHub Actions and Trusted Publishing.
- They generate PEP 740 attestations and Sigstore evidence for each artifact.
- The agent runtime is pinned and verified by SHA256 before running — no unchecked
curl | bash.
Example permission block in the workflow:
permissions:
id-token: write
attestations: write
And the publish action uses attestations: true — no passwords, no persistent API tokens.
Cost, results and practical lessons
- Cost: almost zero. A full release costs about $0.25 on HF Inference Providers when using open weights billed pay-as-you-go.
- Cadence: from 4–6 weeks to every week.
- Observable benefits:
- Better, more consistent notes: the AI delivers the first draft, the human polishes it in 15 minutes.
- Faster detection of breakages: downstream test branches catch integration issues during the RC window.
- Clearer contributor feedback: the automatic comment "this shipped in vX.Y.Z" reduces confusion.
How to adapt this to your project (practical)
If you maintain a Python library you can reuse almost everything:
- Fork
release.ymland the associated scripts. - Change paths and package names, and the list of downstream repos if they don't apply to you.
- Rewrite the
SKILL.mdfiles so the tone and structure match yours. - Pin two repo variables:
MODEL_IDandOPENCODE_VERSION. - Configure Trusted Publishing if you want OIDC on PyPI; otherwise adapt to your publishing process.
- If you don't have downstreams, remove that job.
The most valuable piece to port is the trust-but-verify loop: deterministic manifest — AI draft — validation — re-prompt. That protects against omissions and fabrications.
Future improvement paths
Hugging Face is already thinking about automating downstream failure triage by reading logs and reporting them in Slack, and applying this pattern to other libraries in the ecosystem. The idea is to scale the flow without losing human guarantees where they matter.
The practical lesson is clear: it's not about "letting AI do everything", it's about having the AI draft, checking with code, and letting a person decide. That turns hours of manual work into minutes of review while keeping technical trust and traceability.
