The computing power challenge under the AI game between China and the United States

0 0

As the world of AI big models evolves along the Scaling Law, China’s big models are hindered by a shortage of high-end chips.

A group of “OpenAI rebels” led by Dario have built Anthropic into a global leader in large-scale models with a valuation of trillions of dollars. The company’s Opus 4.6 has become a performance benchmark for large-scale models.

Its latest model, Mythos, has not been directly released publicly due to its “too powerful performance”. Its scale parameter reaches 10 trillion (10 trillion), the training data volume is as high as 300 trillion tokens, and the estimated training cost is 10 billion US dollars.

The US government even suspended all foreign citizens’ access to this model on the grounds of “national security”.

At present, the total parameter count of DeepSeek V4 Pro, the strongest model in China, is 1.6 trillion, which is about 6 times lower than the 10 trillion level products in the United States. However, some studies have shown that DeepSeek V4 Pro lags behind the forefront of the United States by about 8 months.

The root of this intergenerational gap lies in the lack of high-end computing power, where ‘one day for AI, one year on the ground’.

Although international celebrities such as Huang Renxun and Musk have highly praised Chinese AI, the lack of high-end computing power, especially AI training chips, is like a deep gap that has long been present in the competition between China and the United States in AI.

American tech giants are fighting a wealth war relying on huge capital expenditures, a massive number of top GPU clusters, and sufficient per capita token amounts. The GPU computing power of Meta alone exceeds the total of all AI companies in China, and the AI spending of American tech giants is astronomical.

Against the backdrop of exponential growth in computing power demand and sustained high procurement costs for hardware such as storage chips, domestic models like DeepSeek can only reduce costs through model distillation, which has triggered a new round of game between China and the United States.

In the context of the obstruction of high-end AI chip imports and the surge in market demand, how to find a more feasible development path before meeting the demand and domestic substitution have formed a climate is an urgent issue that the entire Chinese AI industry needs to consider.

Computing power constraints
Since the end of last year, domestic GPUs such as Moore Thread, Muxi Shares, Boren Technology, and Tiantian Zhixin have sparked a capital wave. However, under the wealth feast of the secondary market, a hidden line that cannot be ignored is becoming increasingly clear, and the problems it triggers are becoming more urgent.

In the past few years, domestic AI chips have mainly focused on the relatively safe and peripheral “inference side”. For example, in recent times, Doubao plans to purchase 50000 chips from Tiantian Zhixin for inference and computation tasks to meet the high-frequency calls of China’s largest AI APP terminal.

In the top sequence of the computing power pyramid in AI training, domestic chips can currently only participate in edge “miscellaneous” tasks.

AI training chips are mainly used for training artificial intelligence models, during which a large amount of matrix operations and parameter adjustments are carried out. Therefore, they require powerful computing power and high energy efficiency, with stronger performance and very high prices, such as Nvidia A100, H100, H200, and AMD’s MI300 series;

Compared to that, the task of inference chips is much easier. Used in the deployment phase after model training is completed, it is mainly responsible for executing the inference tasks of the model, which requires high real-time performance. The inference chip needs to have the characteristics of fast response and low power consumption while ensuring accuracy.

A proper analogy is that training is about “learning knowledge” from AI models, while reasoning is about “applying knowledge” to larger models. During the learning phase, the training chip needs to call upon massive amounts of data to “feed” dynamic updates of billions, trillions, and even trillions of parameters. It not only requires powerful computing power, but also efficient bandwidth and communication capabilities, as well as stability under a ten thousand card cluster.

The root cause of the gap between Chinese and American models lies in these ‘invisible places’, especially the absence of high-end training chips.

Under the scaling law of large models, the larger the model parameters, the corresponding linear increase in computing power demand. However, the exponential expansion of computing power and hardware costs makes training large models an “exclusive game” for a very small number of tech giants.

Among American tech giants, Meta alone plans to deploy over 1.2 million high-end GPUs by the end of 2026, with an annual investment of over 145 billion US dollars; According to calculations, Google’s AI computing power is equivalent to 5 million yuan of NVIDIA H100, with one enterprise accounting for 1/4 of the global total.

Amazon, Microsoft, Alphabet, and Meta’s capital expenditures this year reached $725 billion, a year-on-year increase of 77%. This scale is equivalent to 13% of the total private domestic investment in the United States for the year. Da Mo predicts that by 2027, the capital expenditure of American technology companies is expected to reach a historical record of $1.1 trillion.

At present, the United States controls over 70% of the world’s high-end GPUs, and after the chip ban, the available high-end chips in China are only 1/8 of those in the United States. According to the Stanford AI Index Report 2026, the number of data centers in the United States (5427) is more than 10 times that of China.

According to calculations by the China Academy of Information and Communications Technology (CAICT), as of early 2025, the computing power scale in the United States will be 2400 EFLOPS, while in China it will be 1053 EFLOPS, more than twice that of China.

The computing power scale of each of the four tech giants mentioned above has exceeded the sum of all AI companies in China.

This crushing computing power advantage allows American companies to complete more than ten rounds of large model iteration experiments within a year.

Musk is even more extravagant, with his xAI owning Colossus 2, which claims to be the world’s “first GW level AI cluster”. Therefore, he has the confidence to claim that he is training seven models simultaneously – two 1 trillion, two 1.5 trillion, one 6 trillion, and one 10 trillion parameter models. This kind of “violent aesthetics” can only be achieved with extremely abundant computing power.

At the same time, due to US restrictions on chip exports, the share of high-end AI chips shipped by Chinese companies in recent years has continued to decline (according to epoch. AI statistics).

It can be said without exaggeration that the huge gap in computing power base will lead to China’s AI being in a catching up stage for a long time, and it will also make the process of domestic large models catching up with their American counterparts more difficult.

Intergenerational differences
The pace of innovation in China is unstoppable. If anyone thinks that China cannot make (chips), then they are really wrong. The gap between China and the United States is only nanoseconds.

NVIDIA founder Huang Renxun has praised the progress of Chinese semiconductors in public more than once.

Musk often expresses similar views on X – “China will definitely solve the chip bottleneck problem, and in the field of artificial intelligence computing power, it will far surpass other countries in the world”, “China will win the AI competition on Earth”.

The highly acclaimed experts in the technology industry speak highly of China’s AI development, which can easily be taken seriously. These remarks are clearly suspected of being pro democracy. Some American media constantly promote the opinion that the gap between the Chinese and American models is extremely small, attempting to confuse facts and conceal some objective truths.

Regarding this, the domestic AI related fields should remain clear headed and calm.

If China’s advanced large-scale models are not significantly different from their American competitors in solving standardization problems, then the gap will become even more apparent in complex industrial and corporate environments.

Compared to cutting-edge models from companies such as Anthropic in the United States, China is still a follower. CAISI evaluation in the United States believes that the strongest DeepSeek V4 Pro in China lags behind the forefront of the United States by about 8 months.

Li Kaifu recently pointed out in an interview with The Wall Street Journal that, using top American models such as the Claude Fable 5 launched by Anthropic as benchmarks, the United States is currently leading China by about 15 months.

Large models follow the Scaling Law, where the larger the number of model parameters, training data, and computational power invested, the better the performance of the model. Nowadays, the most advanced large-scale models in the United States have entered the era of billions of parameters, and the iteration speed is still accelerating.

Anthropic’s most powerful Mythos has reached 10 trillion parameters, and training it would cost $10 billion; XAI’s Colossus 2 is simultaneously training 7 models, including 6 trillion and 10 trillion parameter models; The cycle of OpenAI iterating a 4 trillion parameter model is only one month.

The total parameter count of DeepSeek V4 Pro, the strongest model in China, is 1.6 trillion, which is about 6 times lower than the 10 trillion level frontier in the United States.

The Claude series under Anthrpoic has been recognized as the strongest AI programming model in the past two years, while Mythos has once again refreshed the public’s perception, with its performance even more powerful than the previous flagship Oups 4.6.

OpenBSD is known as the most secure system in the industry, but Mythos discovered a vulnerability that had not been discovered for 27 years. It also searched for vulnerabilities in FFmpeg and the Linux kernel that had not been discovered for several years or even more than a decade, and independently discovered them throughout the process without relying on humans.

You should know that the “pre training” of a large model determines the upper limit of its ability, and it is impossible to adjust a trillion level parameter model to the ability level of a 10 trillion level parameter model through “post training”. The determining factor for pre training is the high-end computing chip, which determines the parameter size and training iteration speed.

Liu Qingfeng, Chairman of iFlytek, frankly stated that currently, top model manufacturers, especially American giants, are building large-scale computing platforms. However, domestic computing power is currently facing a period of pain, which has led to limitations in training long text contexts.

It can be seen that the computing power gap is the root cause of the difference in models between China and the United States.

The rise of domestic production
A company monopolizes 90% of the global market share for high-end AI training chips, which helps Nvidia maintain its position as the world’s largest company by market value. Its total market value once exceeded the GDP of Germany, the world’s third-largest economy, by 2025.

According to data from Jibang Consulting, in Q1 2026, Nvidia alone will consume 68% of the global GPU server market, AMD will occupy 5% -6%, and domestic GPU manufacturers as a whole will account for less than 4%.

With its first mover advantage, strong technological barriers, high-speed connectivity, software ecosystem, and integration with TSMC’s advanced processes, NVIDIA dominates the world. In high-end training scenarios, the Nvidia GB300 outperforms the AMD MI325, as well as the Cambrian Siyuan 690 and Moore Thread MTT40. Especially in training trillion parameter large models, its performance is over 30% stronger than its competitors.

Under the export ban, Huang Renxun has previously stated that Nvidia’s market share (newly added) in China has basically returned to zero, leaving only the existing market. With the support of domestic substitution policies, companies such as Huawei Ascend 910, Haiguang DCU ShenSuan 2, Cambricon Siyuan 370/590, as well as Moore and Mu Xi have emerged one after another.

Among them, the Ascend 910 is Huawei’s strongest computing chip, and the Ascend 910B has a computing power of 640 TOPS (INT8), comparable to the Nvidia A100 chip.

In terms of absolute performance, although there is still a gap between domestic GPUs, we can start with inference and edge scenarios. Currently, domestic GPUs basically meet the general inference needs of domestic government and enterprises, and the gap with Nvidia’s mid-range products has narrowed to 15% -20%, indicating the feasibility of substitution.

It should be noted that while computing power performance is important, the underlying technology and software ecosystem are the weaknesses of domestic GPUs. Just as CUDA is the foundation of NVIDIA’s GPU empire, Zheng Weimin, an academician of the CAE Member, pointed out that the core problem of domestic AI chips is that the ecology is not good enough. If the ecology is good, 60% of the performance can also be used.

It can be said that the software ecosystem is the toughest barrier in the GPU race, and Nvidia’s capabilities in this regard are equally irreplaceable.

After more than ten years of deep cultivation, the CUDA ecosystem has over 4 million developers, hundreds of thousands of open source models, and a full range of third-party toolchains covering AI training, inference, graphics rendering, and scientific computing, with strong ecological barriers.

IDC data shows that currently over 95% of AI models worldwide are developed based on the CUDA ecosystem. With the support of policies, domestic GPUs need to collaborate with the industry chain in the long term, and require sufficient patience from the media, public opinion, and capital markets.

In January of this year, Zhipu collaborated with Huawei to open source the new generation image generation model GLM Image. This model is based on the Huawei Ascend Atlas 800T A2 device and the MindSpore AI framework, completing the entire process from data processing to model training. It is the first SOTA multimodal model to rely on domestic chips for full training;

Moore Thread also collaborated with Beijing Zhiyuan Artificial Intelligence Research Institute to complete the full process training of Zhiyuan’s self-developed embodied brain model RoboBrain 2.5 based on MTT S5000 intelligent computing cluster and FlagOS Robo framework. This achievement validates for the first time the usability of domestic computing power clusters in training embodied intelligent large models.

It can be seen that domestic GPUs have made breakthroughs in adaptability and ecological construction, and are moving from a “single point breakthrough” on the inference side to a “gradual adaptation” on the training side, which is already a significant progress.

summary
Overall, against the backdrop of hindered imports of advanced chips from overseas, it may be better to combine Chinese and Western approaches with two legs, while focusing on supporting domestic computing power chips to meet urgent market demand.

There is no doubt about the authenticity of demand. The “foam theory” still exists, but its voice is not growing. The global market’s enthusiasm for AI construction has surpassed the early development history of any previous industry.

Since the beginning of this year, the global capital market has once again triggered a super AI cycle, with stock prices of Samsung, SK Hynix, Broadcom, and TSMC hitting new highs. In the domestic market, hard technologies represented by Cambrian have also seen a fierce rise, and the market value of optical module giant Zhongji Xuchuang has even surpassed Maotai at one point.

Looking back at the history of semiconductor development in South Korea, the country supported the storage chip industry with national strength, endured the darkest moments, and ultimately defeated Japan to become the absolute king of the world’s storage industry.

Whether it is storage chips, mobile phone chips, or even current AI chips, China is still in the catching up stage, which is not a one-time achievement. But with a huge market, constantly emerging AI talents, and massive capital strength, domestic GPUs have begun to demonstrate a certain degree of adaptability, which can solve the real needs of many AI enterprises.

In this AI game about national destiny, China and the United States are not only opponents, but also have the technology, market, and resources needed by each other.

# AI资讯

The copyright of the article belongs to the author, please do not reprint without permission.

NPhysical AI scale application enterprise ‘Jianghang Intelligence’ completes Series C and D strategic financing of hundreds of millions of yuan

AI导航网

The Moonshot rewrites the rules of the valuation game

AI导航网

MiniMax stands at a crossroads: lifting restrictions, price hikes, and triple strangulation of global giants

AI导航网

Lobster founder’s tweet attracted 8 million people to watch, and the whole internet is clamoring about what the Loop Project is?

AI导航网

Robot Companion “went viral, selling 3800 units in 10 days, only available for adults to purchase

AI导航网

Peaking? Token burnout? This might be the most crucial chart in the entire market

AI导航网

The computing power challenge under the AI game between China and the United States

Claude Code breaks 3500 year old dead language, biggest archaeological linguistic breakthrough in 74 years

Physical AI scale application enterprise 'Jianghang Intelligence' completes Series C and D strategic financing of hundreds of millions of yuan

相关文章