SharpPost · In-Depth Analysis
An independent analysis based on the U.S.-China Economic and Security Review Commission (USCC) research paper "Two Loops: How China's Open AI Strategy Reinforces Its Industrial Dominance"
337
Chinese open-source models
Up from 32 in 2022 — a 10.5× increase
100,000+
Alibaba Qwen derivatives
Largest ecosystem on Hugging Face
¼
Chinese inference pricing
Kimi K2.5 vs GPT-5.2
7 / 10
HuggingFace Top 10 downloads
Chinese model share (2025 Q4)

Key Findings

The USCC report argues that China's AI competitiveness rests on two mutually reinforcing feedback loops — the digital loop (open-source model ecosystem iteration) and the physical loop (manufacturing deployment generating proprietary data). Their convergence allows China to build a compound advantage that does not depend on frontier chips.

U.S. export controls are calibrated to training compute (the upstream of the digital loop) yet leave the physical loop virtually untouched. Chinese labs have also been systematically distilling capabilities from American closed-source models via API, while Meta's pivot away from open source erodes the principal anchor of the U.S. open-weight ecosystem.

Core thesis: Washington has aimed its policy instruments at one loop, while Beijing accumulates advantage through the compounding of two.

I. The Competitive Landscape: Capital Density ≠ Capability Density

Resource allocation in the U.S.–China AI competition reveals a structural divergence. The four major U.S. tech companies spent a combined $350 billion or more on AI capital expenditure in 2025, and Bloomberg Intelligence projects the figure will surpass $400 billion in 2026. China's leading cloud providers spent less than $40 billion over the same period — a gap approaching ten-to-one. Yet capital density has not translated into a proportional capability gap.

Exhibit 1
U.S.–China Frontier Model Comparison (Based on 2025 Public Data)
Dimension U.S. Leader China Leader Gap Assessment
Training compute xAI Grok 4 (5×10²⁶ FLOPs) Alibaba Qwen3-Max (1.5×10²⁵ FLOPs) U.S. leads ~30×
Training data scale Nvidia Cosmos-1.0 (9 quadrillion tokens) Alibaba Qwen3-Max (36 trillion tokens) U.S. leads; China closing
Model parameters xAI Grok 4 (est. 1.7–3 trillion) Alibaba Qwen3-Max (1 trillion) Gap narrowing
Inference cost GPT-5.2: $4.81/M tokens Kimi K2.5: $1.20/M tokens China 4× cheaper (at parity)
Open-source ecosystem Meta Llama family Alibaba Qwen family (100K+ derivatives) China leads
Sources: Epoch AI; Artificial Analysis; USCC Table 1. Training compute and parameter figures based on publicly disclosed information.

On the surface, the United States retains the lead in raw compute and model scale. But scaling is running into diminishing marginal returns. The defining architectural breakthroughs of 2025 — mixture-of-experts (MoE), chain-of-thought reasoning — did not depend on larger compute budgets; they achieved performance leaps through more efficient design. OpenAI acknowledged in February 2025 that GPT-4.5 was its last model built primarily on scaling up pre-training.

II. The Digital Loop: The Open-Source Flywheel

Loop 1 · Digital Feedback Loop
Release open model Global developer adoption Community iteration Model capability gains Wider adoption

The logic of the digital loop is straightforward: each open-source release triggers adoption and iteration by a global developer community, and the improvements feed back into the next generation. Between 2022 and 2025, the number of Chinese open-source models grew from 32 to 337 (per Epoch AI), a more-than-tenfold increase; over the same period, American open-source models rose from 213 to 622, a less-than-threefold gain.

Exhibit 2
U.S. vs. China Open-Source Model Growth (2022 vs. 2025)
China United States
China 2022
32
China 2025
337 (×10.5)
U.S. 2022
213
U.S. 2025
622 (×2.9)
Source: Epoch AI "AI Models" database. China's official count (1,509) is significantly higher as it includes fine-tuned and derivative models.

2.1 Hugging Face Downloads: Quantifying Ecosystem Dominance

By late 2025, Alibaba's Qwen family had spawned over 100,000 downstream models on Hugging Face, surpassing Meta's Llama to become the world's largest open-source model ecosystem. In November–December 2025, seven of the ten most-downloaded large models were Chinese.

Exhibit 3
Top 10 Most-Downloaded Large Models on Hugging Face (Nov–Dec 2025)
# Company Model Downloads Derivatives
1 ByteDance 🇨🇳 Tarsier2-Recap-7b 10,847,133
2 Alibaba 🇨🇳 Qwen2.5-3B-Instruct 8,780,419 1,674
3 OpenAI 🇺🇸 gpt-oss-20b 8,089,782 672
4 Alibaba 🇨🇳 Qwen2.5-VL-3B-Instruct 7,688,461 699
5 Alibaba 🇨🇳 Qwen2.5-7B-Instruct 7,245,333 3,335
6 Alibaba 🇨🇳 Qwen3-4B-Instruct-2507 6,289,445 400
7 DeepSeek 🇨🇳 DeepSeek-OCR 5,451,968 125
8 Meta 🇺🇸 Llama-3.1-8B-Instruct 5,187,643 3,975
9 Alibaba 🇨🇳 Qwen3-8B 4,761,786 1,239
10 OpenAI 🇺🇸 gpt-oss-120b 4,553,504 161
Source: USCC Figure 5. Data period: 2025.11.08–12.08. Highlighted rows denote Chinese models. Chinese models totaled 51.1M downloads vs. U.S. 17.8M (2.9:1 ratio).

2.2 The Price War: Equal Capability, One-Quarter the Cost

Pricing is the flywheel's accelerant. Kimi K2.5 and GPT-5.2 share an identical Intelligence Index score of 47 on Artificial Analysis, yet their blended token prices differ by a factor of four. This is not an isolated case.

Exhibit 4
U.S. vs. China Major Model Cost–Performance Comparison
Model Origin Intelligence Index $/M Tokens Value Ratio*
GPT-5.2 🇺🇸 47 4.81 9.8
Kimi K2.5 🇨🇳 47 1.20 39.2
GPT-4.5 🇺🇸 ~46 6.00+ ~7.7
DeepSeek-V3 🇨🇳 ~44 0.55 80.0
Qwen3-Max 🇨🇳 ~45 0.46 97.8
Source: Artificial Analysis, as of Feb 2026. *Value Ratio = Intelligence Index ÷ Price. Blended price = output:input 3:1 weighting. OpenAI o1 ($26.25, Index 31) excluded for extreme pricing.

2.3 Infiltrating Silicon Valley: Infrastructure-Level Dependency

a16z partner Martin Casado has disclosed that roughly 80 percent of startups choosing the open-source route are now using Chinese models. Airbnb uses Alibaba's Qwen to power its customer-service bots. Security concerns follow: NIST has found that DeepSeek's open-source models carry higher cybersecurity risks than comparable American models, and Chinese models may embed censorship mechanisms.

III. The Physical Loop: The Factory-Floor Data Flywheel

Loop 2 · Physical-Economy Feedback Loop
Low-cost model deployment Generate proprietary industrial data Data feeds back into model improvement Support more complex deployments

The first loop operates in digital space; the second is rooted in physical manufacturing. Its logic is equally self-reinforcing: open-source models are deployed at low cost into manufacturing; deployment generates proprietary industrial data; that data strengthens the models and supports ever more complex applications. This loop does not require frontier chips — a quality-inspection model at a Guangdong factory runs on a small vision model deployed on edge hardware.

One Guangdong smart factory trained AI models on 5G high-definition camera feeds, lifting equipment-maintenance detection rates by 20 percent and saving over one million yuan per year. Data of this kind — drawn from millions of connected factories, logistics networks, and smart-city sensor nodes — is beyond the reach of web scraping or synthetic generation. Epoch AI estimates that publicly available high-quality training data may be exhausted between 2026 and 2032; at that point, proprietary deployment data will become the scarce resource.

3.1 Data as a Balance-Sheet Asset: From Policy Rhetoric to Accounting Standards

Exhibit 5
Key Milestones in China's Data-Asset Institutionalization
  • 2017 State Council publishes AI development plan, establishing cross-industry integration goals
  • 2020 CPC Central Committee designates data as a "fifth factor of production" (after land, labor, capital, technology)
  • 2022 "Twenty Provisions on Data" issued — foundational charter for data governance
  • 2023 National Data Administration (NDA) established; national public data resource registration platform launched
  • 2023 Ministry of Finance issues accounting standards: enterprises may record data assets as intangible assets or inventory — a global first
  • 2025.08 State Council "AI+" initiative: broad adoption established as a core organizing principle for AI development
Sources: USCC report; publicly available policy documents.

Nominally an accounting reform, the real purpose is to lock deployment-side data advantages into the national balance sheet. The National Data Administration reports that Chinese-language data accounts for 60–80 percent of domestic model training data — China's data advantage in Chinese-language AI is effectively structural.

IV. Where the Two Loops Converge: Small Models as the Critical Variable

The report's analytical edge lies in identifying the mechanism that connects the two loops: small, specialized language models (SLMs). Nvidia researchers have noted that the bulk of operational tasks in autonomous AI systems are handled not by frontier large models but by task-specific fine-tuned SLMs, at costs 10 to 30 times lower.

10–30×
SLM cost advantage
vs. frontier LLMs
#1
Top HuggingFace download
Not an LLM — a fine-tuned SLM
No need
for frontier chips
Runs on edge hardware

The most-downloaded model on Hugging Face in late 2025 was not a frontier LLM but ByteDance's video-captioning model, fine-tuned from Alibaba's Qwen2-VL-7B-Instruct. Enterprises take open-source base models, fine-tune them for specific use cases, and deploy; the deployment data then flows back to strengthen the models — all without frontier-class compute. Small models are precisely the category the open-source ecosystem excels at producing, and China dominates the global open-source ecosystem.

V. The Distillation Controversy: Free Tutors

Chinese AI labs have not relied solely on open-source iteration to close the gap. Distillation — using a frontier model's outputs to train one's own smaller model — amounts to turning a competitor into a free tutor.

Exhibit 6
Model Distillation Controversy Timeline
  • Jan 2025 DeepSeek releases R1 model (built on Qwen 2.5 / Llama 3 architectures), later named a Time "Invention of the Year"
  • Late Jan 2025 OpenAI publicly accuses Chinese labs of unauthorized distillation, having detected large-scale API calls routed through disguised third-party proxies
  • Fall 2025 Microsoft security researchers independently discover accounts linked to DeepSeek bulk-harvesting OpenAI model outputs
  • Feb 13, 2026 Bloomberg reports on an internal OpenAI memo characterizing distillation as intellectual property infringement
  • Feb 24, 2026 Anthropic accuses DeepSeek, Moonshot AI, and MiniMax of coordinated "distillation attacks" against Claude
Sources: Bloomberg; CNBC; The Economist; USCC report.

The structural irony of distillation is this: American companies invest billions of dollars to train frontier models and open APIs to generate commercial revenue; Chinese labs use those same APIs to transfer frontier capabilities into their own models at negligible cost. The revenue flows to America; the capability accrues to China. The U.S. Department of Justice has yet to bring charges — against a company operating primarily within Chinese jurisdiction, enforcement remains beyond practical reach.

The open-source ecosystem provides architectural foundations; distillation extracts frontier capabilities from closed-source models. The two pathways converge on the same destination. Has the United States inadvertently cultivated its own competitor? Judged by outcomes, the answer no longer requires speculation.

VI. America's Strategic Blind Spots

6.1 Export-Control Coverage Analysis

Exhibit 7
U.S. Export-Control Coverage vs. China's AI Competitive Dimensions
Competitive Dimension Control Coverage China's Workaround Risk Rating
Frontier training compute COVERED Architectural innovation (MoE, CoT) lowers compute requirements Medium
Open-source model ecosystem NOT COVERED Open weights freely available; community iteration accelerates gains High
SLM industrial deployment NOT COVERED No frontier chips needed; runs on edge hardware Critical
Deployment-side data accumulation NOT COVERED Manufacturing + IoT + 5G generates proprietary data Critical
Distillation / knowledge extraction NOT COVERED Systematic capability extraction via closed-source APIs High
Data-asset institutionalization N/A NDA + accounting standards classify data as national assets High
Source: SharpPost independent analysis based on the USCC report.

The models that industrial AI requires are small, task-specific, and open-source. Seen in this light, America's export-control framework may be targeting the wrong competitive layer.

6.2 Meta's Closed-Source Pivot: The Anchor Loosens

Meta announced in late 2025 that its next-generation model, Avocado, would be closed-source. The distillation controversy was a driving factor: Chinese labs had exploited Llama's open architecture alongside American closed-source APIs to accelerate their own development. Yann LeCun's subsequent departure reflected a deep internal rift over open-source strategy. While OpenAI and Nvidia have released open-weight models, these efforts remain focused on the digital loop — none addresses deployment-side data accumulation.

6.3 The Application Layer: U.S. Still Leads, but the Window Is Narrowing

Exhibit 8
U.S. vs. China AI Application Monthly Visits (December 2025)
Category U.S. Leader Monthly Visits (M) China Leader Monthly Visits (M)
General chatbot ChatGPT 5,700 DeepSeek 451
AI agent GenSpark AI 13.5 Nano AI (360) 189
Code assistant GitHub Copilot 304 Comate (Baidu) 1.85
AI search New Bing 1,330 Nano AI Search 279
Image generation SeaArt 25.5 Jimeng AI 11.5
Video generation Sora 35.1 Klingai (Kuaishou) 14.3
Source: AICPB "Global AI Rankings by Users," accessed Jan 23, 2026. All three leading AI-agent companies have Chinese backgrounds.

The United States still leads in consumer-facing AI applications — ChatGPT's monthly traffic exceeds DeepSeek's by a factor of twelve. But dominance at the application layer may mask a deeper structural risk: Chinese AI companies are increasingly shifting their center of gravity from consumer products to industrial integration.

Conclusion: Defending the Lab Is Not Enough

The penetrating insight of the USCC report can be distilled into a single proposition: Washington has aimed its policy instruments at one loop, while Beijing accumulates advantage through the compounding of two. Export controls can constrain training compute, but they cannot prevent a nation possessing the world's largest manufacturing base, densest IoT infrastructure, and most comprehensive data-asset institutionalization framework from building an AI advantage on the deployment side — independent of frontier chips.

Factor in the dimension revealed by the distillation controversy — American frontier models' commercial APIs inadvertently serving as capability pipelines for Chinese labs — and the proposition that the United States has cultivated its own competitor is less rhetoric than a sober description of competitive structure.

This is not a question that can be answered with stricter chip bans. It requires the United States to rethink its theory of AI competition — broadening the aperture from a frontier-scale race to a contest over deployment ecosystems. The battlefield has moved beyond the laboratory to the factory floor, the shipping dock, and the warehouse. Defending the lab is far from enough.