Hugging Face Reports Chinese Open Models Overtook U.S. on Hub as Qwen and DeepSeek Drive Derivative Boom

Hugging Face’s March 17, 2026 ecosystem report, State of Open Source on Hugging Face: Spring 2026, says Chinese open models have overtaken U.S. models on recent Hub adoption, with China accounting for 41% of downloads over the past year in the blog’s framing. For developers, the bigger shift is structural: the Hub is increasingly driven by DeepSeek, Qwen, and a fast-growing layer of quantized, fine-tuned, and repackaged derivatives rather than only original base model labs.

Hugging Face ties that snapshot to platform scale in 2025, 11 million users, more than 2 million public models, and over 500,000 public datasets. The post also points to heavy concentration, with the top 200 models accounting for 49.6% of all downloads, while roughly half of all models have fewer than 200 downloads.

The headline claim is that China has moved ahead of the U.S. in recent open-model adoption on the Hub. Hugging Face’s linked analysis is grounded in the paper Economies of Open Intelligence: Tracing Power & Participation in the Model Ecosystem, which analyzed 851,000 models, 200+ attributes per model, and 2.2 billion downloads from June 2020 to August 2025.

Using the paper’s recent-year methodology, Chinese developers captured 17.1% of recent downloads versus 15.8% for U.S. developers, with International/Online at 23.8%.

Recent download share	Share
International/Online	23.8%
China	17.1%
USA	15.8%

For teams selecting open models, this matters because the most active ecosystems on Hugging Face are no longer clustered around the same set of Western labs that dominated the earlier open-weight cycle.

DeepSeek and Qwen as Ecosystem Drivers

Hugging Face points back to the January 2025 release of DeepSeek R1 as the turning point that accelerated Chinese open releases. The post says Baidu went from zero Hub releases in 2024 to more than 100 in 2025, while ByteDance and Tencent increased releases eight- to nine-fold.

The underlying paper shows how concentrated that shift became among a few developers.

Developer	Recent download share
lmstudio-community	16.4%
deepseek-ai	9.6%
comfy	5.4%
Qwen	4.6%

DeepSeek and Qwen matter as model families, but the more durable signal is downstream reuse. Hugging Face says Alibaba has more derivative models than Google and Meta combined, and the Qwen family accounts for more than 113,000 derivative models. Count all models tagged Qwen, and the total exceeds 200,000.

If you build on open models, this shifts evaluation work away from brand-level comparisons and toward lineage-level comparisons. Two artifacts derived from the same upstream family can differ materially in quantization, prompting behavior, hardware fit, and license terms. Your model registry and eval process need to capture that.

The Intermediary Layer Is Now Core Infrastructure

The paper describes an emergent developer intermediary layer made up of groups that quantize, merge, fine-tune, package, and redistribute models. Repositories such as lmstudio-community, comfy, and mlx-community are part of that pattern.

This is where deployment strategy changes. If you run local or edge inference, the winning model for your workload may come from a community repackager rather than the original publisher. Quantized variants, adapter merges, and hardware-specific builds now have enough download share to shape the market directly. For teams running local agents or desktop workflows, this connects closely to the practical tradeoffs in How to Run LLMs Locally on Your Machine.

The same dynamic affects retrieval and adaptation choices. If your use case needs domain behavior more than broad generality, the decision often lands between derivative fine-tunes and retrieval pipelines, which is the same boundary covered in Fine-Tuning vs RAG: When to Use Each Approach and What Is RAG? Retrieval-Augmented Generation Explained.

Openness Is Expanding, Transparency Is Falling

The Hugging Face post presents growth in open ecosystems, but the linked paper shows a narrower definition of openness is weakening. Models disclosing training-data information fell from 79.3% in 2022 to 39% in 2025. The paper also says open-weight models surpassed truly open-source models in 2025, using disclosure-based criteria aligned with the OSI framing.

Transparency metric	2022	2025
Models with training-data disclosure	79.3%	39.0%

For developers, this affects procurement, compliance, and reproducibility. If your stack depends on auditable provenance, data governance, or repeatable fine-tuning, “available on the Hub” is no longer enough as a selection filter. You need metadata checks alongside benchmark checks, similar to the broader discipline of How to Evaluate AI Output (LLM-as-Judge Explained), but applied to model sourcing and documentation.

Model Supply Is Getting Larger and More Specialized

The paper also quantifies the technical shape of this market. Average model size increased 17× from 2020 to 2025. Multimodal generation rose 3.4×, quantization increased 5×, and mixture-of-experts usage increased 7×.

Those numbers point to a more fragmented open-model landscape. Families like Qwen and DeepSeek supply the gravitational pull, but deployment-ready variants are increasingly optimized for narrow hardware and task constraints. If you maintain agent systems or coding workflows, that raises the value of explicit context, routing, and tool controls, which aligns with the engineering patterns in Context Engineering: The Most Important AI Skill in 2026.

If you rely on Hugging Face as your primary open-model discovery layer, update your selection process now: evaluate derivatives as first-class candidates, verify transparency metadata before adoption, and benchmark Chinese model families and community repackages against your existing defaults instead of treating them as edge options.

Hugging Face Reports Chinese Open Models Overtook U.S. on Hub as Qwen and DeepSeek Drive Derivative Boom

DeepSeek and Qwen as Ecosystem Drivers

The Intermediary Layer Is Now Core Infrastructure

Openness Is Expanding, Transparency Is Falling

Model Supply Is Getting Larger and More Specialized

Keep Reading

How to Run In-Loop Model Evaluations With olmo-eval

8K Context Reranking Hits Hugging Face With Ettin Cross-Encoders

Cohere Transcribe debuts as open-source ASR model

Reflection AI Secures SpaceX GB300 Cluster for $150M Monthly

Multilingual PP-OCRv6 Beats GPT-5.5 on Industrial Text

Download Share and Geographic Shift

DeepSeek and Qwen as Ecosystem Drivers

The Intermediary Layer Is Now Core Infrastructure

Openness Is Expanding, Transparency Is Falling

Model Supply Is Getting Larger and More Specialized

Keep Reading

How to Run In-Loop Model Evaluations With olmo-eval

8K Context Reranking Hits Hugging Face With Ettin Cross-Encoders

Cohere Transcribe debuts as open-source ASR model

Reflection AI Secures SpaceX GB300 Cluster for $150M Monthly

Multilingual PP-OCRv6 Beats GPT-5.5 on Industrial Text