State of Open Source on Hugging Face: Spring 2026 | Keryc
Hugging Face publishes a clear snapshot of how the open source ecosystem changed over the last year. What grew, who leads, and why does it matter for you as a developer, researcher, or entrepreneur? Here I explain it in plain language and with the technical eye you need.
Growth and participation: active community, not just consumption
The platform nearly doubled in users and artifacts: 11 million users, more than 2 million public models, and over 500k datasets. That’s not just hype; it’s real participation. More and more people don’t just download models — they modify them: fine-tunes, adapters, benchmarks, and applications.
Key data: half of models have fewer than 200 downloads, while the top 200 models (0.01% of the total) concentrate 49.6% of all downloads.
Does that mean only a few models matter? Not entirely. Specialized communities (by domain, language, or task) emerge and show sustained reuse even if their global numbers look modest. So if you’re working in a niche, your contributions can still be the ones others adopt.
Concentration, downstream use, and economic value
The pattern resembles traditional free software: the value the community generates from open artifacts often exceeds the cost of producing them. In AI it’s the same: open models get adapted and specialized into thousands of downstream applications. That lowers costs and increases flexibility compared to closed systems.
Both large companies and startups use open models as default components. More than 30% of the Fortune 500 have verified accounts on Hugging Face. NVIDIA shows up as a strong contributor, and firms like Airbnb have increased their commitment.
Geographic rebalance: China takes the lead in downloads
One of the most relevant changes: China surpasses the United States in monthly and total downloads, representing around 41% of downloads. Chinese organizations went from almost zero to publishing hundreds of repositories in months.
Concrete examples: after the viral effect of DeepSeek R1, Baidu went from 0 to over 100 releases in 2025; ByteDance and Tencent multiplied their releases by 8 or 9.
Who builds and who consumes: industry vs independents
Industrial participation fell: industry’s share in development dropped from ~70% before 2022 to ~37% in 2025. At the same time, independent developers rose from 17% to 39% of downloads. Individuals and small collectives now influence which models are practical for end users.
A striking fact: the Qwen family has more than 113,000 derived models; if you count all models that tag it, they exceed 200,000.
Technical trends: size, adaptations, and efficiency
The average size of downloaded models rose from 827M parameters in 2023 to 20.8B in 2025, but the median barely grew (326M to 406M). That tells us advanced users pull very large models, while practical use is still dominated by small models.
Adoption of quantization and Mixture-of-Experts architectures pushes the use of large models while reducing inference costs. Also, the performance gap between giant and small models is shrinking fast thanks to fine-tuning and task-specific adaptations.
The ATOM Project’s relative adoption metric shows that models of 1–9B parameters aren’t far behind >100B giants in downloads, especially once you consider production deployment and latency limits.
Hardware, kernels, and deployment
NVIDIA dominates model optimization, but AMD support is growing. Hugging Face launched the Kernel Hub to run kernels optimized for both NVIDIA and AMD. In China, models are published with explicit support for domestic chips and inference stacks from companies like Alibaba.
Practical result: more models are actually runnable in local data centers or on edge hardware, reducing dependence on big public clouds and democratizing deployments.
Lifecycle and update rhythm
Engagement with a model tends to spike after release and then fall: the average interest duration is about 6 weeks. That’s why organizations that release frequent updates or successive versions (for example DeepSeek V3, R1, V3.2) stay relevant.
If you don’t update, you get overtaken by those who do — or by people who publish niche fine-tunes.
Emerging communities: robotics and science
Robotics was the fastest-growing category: datasets went from 1,145 in 2024 to 26,991 in 2025, and today it’s the category with the most datasets on the Hub. Projects like LeRobot and collections such as L2D or RoboMIND provide massive scales of trajectories and real-world tasks.
Science also carved out space: protein folding, molecular dynamics, drug discovery, and scientific analysis increasingly rely on open models and datasets. Here, community collaboration coordinates interdisciplinary efforts at scale.
Practical implications for developers and companies
If you’re a developer: prioritize smaller, practical models for production, and learn quantization and pruning strategies to reduce latency.
If you’re a researcher: publish reproducible artifacts; the community reuses them and that accelerates impact.
If you’re a company: consider keeping weights open when security and sovereignty allow; it gives you flexibility and cost advantages.
Observation: digital sovereignty is real. Open-weight models let governments and organizations train and audit within local legal frameworks.
What’s coming in 2026
Competition for open alternatives to front-line models from the US and China (examples: GPT-OSS, OLMo, Gemma). The question is whether they can reach the momentum of Qwen and DeepSeek.
More multimodal sub-ecosystems: robotics, science, agents, and applications that require interoperability between models.
Public debate about investing in open infrastructure: data centers and access to compute remain bottlenecks for large-scale development.
Final reflection
Open source is no longer just an academic option. It’s the practical layer where much of today’s AI is built, adapted, and deployed. Want to influence the direction of AI? Participating in these repositories, creating useful derivatives, and optimizing deployment is the most direct way.