Building the Foundation | WAIC 2026 Computing Power: Can Hypernodes and Optical Interconnection bypass the physical ceiling of a single chip?

0 0

When the computing power competition enters a new stage of “system level sovereign competition” from “single card competition”, the measurement standard is no longer the peak of a single chip, but how much computing power the entire system can fully utilize.

In 2026, the focus of the industry has shifted from training to reasoning, and the scale of reasoning computing power has surpassed that of training. Computing power has transformed from a research tool to a universal infrastructure for the entire industry, becoming a daily operating cost that must be paid. The industry no longer asks’ how many cards are there ‘, but’ what is the effective computing power ‘. For every 10 percentage points increase in cluster linearity, hardware costs decrease by 15% and electricity costs decrease by 20%. A 10000 card cluster means billions of yuan in real money.

When a single chip approaches its physical limit, what supports the sustained growth of effective computing power? The “cornerstone building” chapter of WAIC 2026 provides system level answers from four aspects: architecture, technology, ecology, and engineering.

Architecture Breakthrough: Can Hypernodes Break the Physical Ceiling of Single Chips?

The performance of a single chip is almost reaching its ceiling, and the scale of computing power continues to grow. It can only rely on one thing, which is to connect more chips in a faster way. This is what the ‘supernodes’ are doing.

The traditional practice is to stack cards, and the more the better. But this road is getting narrower and narrower. The cross node communication overhead accounts for over 30% of the total training time for GPT-5 level large model training. That is to say, for the money spent on buying 100 cards, there is 30 cards waiting for data. GPU computing power increases by 2 to 3 times per year, but memory bandwidth only increases by 15% to 30% per year, and the gap between the two is widening.

The problem solved by supernodes is to use high-speed interconnection technology to turn dozens or even hundreds of GPUs into a unified “computing matrix”, turning data that originally needed to be transmitted across cabinets into internal communication and significantly reducing waiting time.

Huawei will showcase the Atlas 950 SuperPoD for the first time in WAIC 2026, which is currently the largest commercial hypernode in the industry. Starting from a single cabinet of 64 cards, it can connect up to 8192 NPU cards, specifically designed for training and inference of large models with trillions of parameters. Huawei has also proposed a new idea called “Tao’s Law”, which suggests that instead of blindly focusing on transistor size, it is better to compress the time delay of signal transmission and use architectural innovation to achieve high performance in mature processes.

Atlas 950 supernode

ZTE adheres to the concept of openness and decoupling, and collaborates with partners such as Xizhi Technology, Boren Technology, Muxi Shares, Suiyuan Technology, and Tiantian Zhixin to build a domestically produced high-performance Matrix supernode based on the OEX+dOCS architecture. Advocating multi-core collaboration, targeting different application scenarios, independently selecting the best domestic chip combination, and creating the most TCO computing power base, this innovative architecture has been shortlisted for the SAIL Award of this WAIC. The concerted efforts of various domestic chip manufacturers to polish a system is a signal that China’s computing power is moving towards systematic and coordinated development.

Technical speculation: In the post Moore era, is optical interconnection the only technological route?

Hypernodes solve the problem of “how to connect”, while optical technology solves the problem of “what to connect with”, which is closer to the physical underlying layer.

The slowing down of Moore’s Law is no longer a topic of debate in the industry. Transistors are becoming smaller, with higher costs and lower profits. Electric signals are inherently prone to heat generation and bandwidth limitations, and the two walls of memory and interconnect are difficult to overcome with electronic technology. Light is different. Photons have a much faster transmission speed than electrons and do not generate heat or consume electricity, making them naturally suitable for large-scale high-speed data transmission.

Optical interconnection provides a “data high-speed railway” for computing power clusters, while optical computing directly uses photons for computation, bypassing the physical limits of electronic circuits. These two technologies combined are regarded by the industry as the most promising path for the post Moore era.

Capital has already voted with its feet. In 2026, Xizhi Technology will be listed on the Hong Kong Stock Exchange, known as the “world’s first AI silicon optical chip stock”. At this WAIC, Xizhi will hold the first ever Optical Technology Forum in the history of the conference. Traditional electronic chips are limited by the slowing down of Moore’s Law and the “memory wall” and “interconnect wall”, resulting in a severe lag in computing power supply. Therefore, optical technology has become the key to breaking the deadlock, and optical interconnection provides low latency, high bandwidth, and low energy consumption support for computing power clusters; Optical computing utilizes the advantages of photon parallelism and linear operation to bypass the miniaturization limit of electronic transistors. The Optical Technology Forum truly showcases how optical interconnection and optical computing can be implemented in smart computing clusters, and directly responds to the ultimate industry problem of “light energy cannot replace electricity”.

Tian Shu · Light Cube

An increasingly clear consensus is that in large-scale clusters such as supernodes, optical technology is not an optional option, but a mandatory one. Huawei’s Atlas 950 and ZTE’s OEX both rely on optical modules to achieve ten thousand card level interconnection. With the help of WAIC’s top industry platform, showcase domestically developed optical computing technology solutions, open communication, and jointly build a computing ecosystem.

Ecological Breakthrough: Can Open Source Collaboration Break the Fragmentation Dilemma of “One Card, One Software”?

The hardware architecture continues to innovate, and the optical interconnect technology continues to evolve. If the software and storage capabilities cannot keep up, the computing power will still be difficult to run at full capacity, and the ecological base must be upgraded synchronously.

There are now over a hundred AI chip manufacturers worldwide, each with their own programming models, operator libraries, and communication protocols. The migration of a model from Nvidia cards to domestic chips often requires recompilation and re optimization, which is extremely costly. The result of fragmented computing power is that the more hardware is purchased, the less than 40% can actually be used.

The Global AI Open Computing and Intelligent Agent Technology Ecology Forum, led by Turing Award winner David Patterson this year, aims to address this issue. The core solution is a unified intelligent computing base called FlagOS. It can also be understood as creating a universal “operating system” for all chips, allowing chips of different architectures to run the same software.

More noteworthy is that this forum has invited three major international open source foundations, Linux, Eclipse, and PyTorch, to replace vendor lock-in with open source collaboration. This is the first time that domestic computing power has obtained an internationally recognized “software passport”.

Another role that has been overlooked for a long time is storage. The so-called ‘I/O wall’ in the industry is essentially an inherent bottleneck in the von Neumann architecture where storage and computing speed do not match – the computing power of computing units continues to soar, while data supply efficiency cannot keep up, resulting in GPUs frequently being idle due to waiting for data.

Storage is the core foundation for connecting software and hardware collaboration and achieving ecological closed-loop, and it is also a key ecological weakness that has been underestimated for a long time. Western Digital participated in the conference for the first time and set up a forum on “Data Storage Architecture for the AI Era”, focusing on the difficulties and breakthroughs in the integrated ecological collaboration of storage, computing power, and security, and filling the last link of the fragmented computing power ecosystem. Its industry research clearly confirms that the core competitiveness of top enterprises in AI landing lies not in the ultimate performance of single-chip hardware, but in the global ecological synergy ability of storage, computing, and security. The current computing power ecosystem generally suffers from structural fragmentation, with massive amounts of data being idle and unable to integrate into the computing power scheduling system, resulting in expensive GPU computing power being unable to fully utilize due to the lack of data supply links and inadequate ecological adaptation. This also makes storage completely jump out of the traditional supporting role of hardware, and become the core key that runs through the computing power ecology and determines the overall efficiency and comprehensive cost of the cluster.

The weakness of the computing ecosystem is not only the software stack fragmentation between chips, but also the collaborative fracture between computing and storage. FlagOS unified intelligent computing base solves the former – allowing chips of different architectures to run the same software; And the goal of storage collaborative optimization is to solve the latter – to make massive data appear in the right place at the right time, freeing the GPU from the state of “waiting for data”. Only by filling in two gaps can the computing power ecosystem truly be considered a closed loop.

Hammerspace, an American data orchestration company, will showcase a high-performance global data platform, which is precisely the solution to this pain point. Its core breakthrough lies in Tier 0 functionality, which can instantly transform NVMe storage from any vendor into an ultra-high performance storage layer without the need to replace existing devices. Actual test data shows that customers can activate 20PB of Tier 0 capacity within 1.5 days, achieving 100% line speed performance, increasing GPU utilization by over 40%, and reducing infrastructure costs by 50% per TB.

The uniqueness of this platform is reflected in the unified global namespace, which builds seamless data views at the edge, data center, and cloud; Intelligent data orchestration engine, which automates data movement to ensure that data appears at the correct time and location; Completely based on standard protocols such as pNFS, NFS, SMB, and S3, the agentless architecture does not invade GPU nodes, reducing coupling and operational risks; Strategy driven automated orchestration can converge over 10 storage platforms into one unified data management platform.

At present, this road has been preliminarily connected, and the I/O wall has been partially penetrated, forming a phased solution.

Project implementation: How to reduce the cost of a single token from 1.3 million to 350000?

No matter how good the technology is, it is meaningless if it cannot be implemented. The WAIC 2026 exhibition layer presents a crucial leap from concept to engineering.

Computing power scheduling is the first key to cost reduction. This time, Wuwen Xinqiong brings the “Token Super Factory” of the intelligent agent era, focusing on the full stack technology layout of “front store, back factory, one center”, presenting the independently controllable Agentic Infra autonomous infrastructure and Agentic MaaS large model service platform, as well as AI productivity application display. By using cross cluster heterogeneous PD separation technology and self-developed full stack inference optimization tools, the inference cost is reduced by 10 times compared to the traditional single instance mode on a trillion parameter level model, achieving the ultimate reshaping of productivity conversion efficiency from domestic computing power to AI applications.

AI productivity formula

Cooling and networking are two other overlooked cost reduction levers. The upcoming “Shanghai Cube” single cabinet 128 card liquid cooled cabinet has been successfully operated on the DeepSeek 671B large model. This product is a domestically developed high-density computing equipment with independent software and hardware in one stack, jointly developed by multiple institutions such as Jiafeng Information, Lixun, Muxi, Yunhe, Daoke, Wuwen Xinqiong, Fudan University, Chuangzhi College, and Muhe Information. The power density of a single cabinet exceeds 100kW, and traditional air cooling solutions usually stop at 20-30kW/cabinet. Beyond this upper limit, liquid cooling is no longer an option, but the only way out. Actual test data shows that the liquid cooling solution can reduce the PUE value to below 1.05, saving more than 40% energy compared to traditional air-cooled data centers.

At the level of large-scale super cluster project implementation, Zhongke Shuguang made a heavyweight appearance with its scaleX 10000 card super cluster. Relying on self-developed open AI architecture, it is compatible with multiple domestic acceleration cards and CUDA ecology, finally breaking through 5 EFlops, pioneering the fusion paradigm of supercomputing and AI dual computing, supporting 8-64 bit full precision computing, and solving the scheduling problem of 100000 card clusters with scaleFabric lossless high-speed network. Combined with self-developed submerged liquid cooling, it achieves 1.04 low PUE.

ScaleX Ten Thousand Card Supercluster Group

At the same time, the 102.4T intelligent computing switch with the highest single-chip bandwidth in the New H3C industry directly reduces latency with one hop. In large-scale distributed training, the computational power loss caused by network congestion can reach over 30%. New H3C has achieved a 30% improvement in training performance and a 25% reduction in model training time through collaborative optimization of the computing network. With the same hardware investment, more effective output can be obtained, resulting in a decrease in single token costs.

AI chip high-speed interconnection innovative technology architecture

From reducing losses in supernodes and energy consumption through optical interconnection, to reducing adaptation costs in the middle layer and unifying storage to reduce resource consumption, the entire chain is doing the same thing, which is to lower the overall cost of a single token. This is the true leap of computing power from being “able to create” to being “affordable”.

System is king: from single card competition to system level national strength competition

China’s computing power is moving from the “stack card era” to the “system era”, with single chips approaching physical limits. System level collaborative innovation has become the key to competition in the next generation of computing power.

Shanghai has built a unified computing power dispatch platform that can carry 160000 P of heterogeneous computing power, an annual computing power voucher of 1 billion yuan, and a support base for the entire industry chain, forming a unique system level cluster advantage. It is not only the confidence of the industrial high ground, but also contributes to the “Shanghai Plan” for the independent and controllable computing power of the country.

The complete independent technology map of China’s computing infrastructure presented in this WAIC session, including super node clusters, optical interconnect technology, and cross chip unified computing power base, is not only the “quantity” of computing power, but also the “soul” of computing power.

# AI资讯