> **来源:[研报客](https://pc.yanbaoke.cn)** # ANCHOR REPORT Global Markets Research 9 January 2026 # Global AI networking supercycle driven by tech upgrade/supply shortage # Accumulate market leaders InnoLight, YOFC, TFC and T&S We believe the global AI networking supercycle will continue in 2026F, and even extend into 2027F, driven by multiple technology upgrade roadmaps (i.e. Silicon Photonics [SiPh], Co-package optics [CPO], Optical Circuit Switch [OCS], Active Electronic Cable [AEC], Hollow Core Fiber [HCF]) and supply shortage (i.e. optical laser chip/material). We estimate 800G/1.6T transceiver shipments will grow from 20mn/2.5mn units in 2025F to 43mn/20mn units in 2026F, with SiPh transceivers gaining $50 - 70\%$ market share, and expect InnoLight (300308 CH, Buy, our sector top pick) to continue to dominate the high-end market. Owing to likely more aggressive bundling sales strategy from global AI leaders, CPO migration should accelerate from 2026 (from a low base), and we think TFC (300394 CH, Buy) and T&S (300570 CH, Buy) will likely be key beneficiaries in the global CPO value chain. We upgrade YOFC (6869 HK) to Buy from Neutral as the company's AIDC business enjoys buoyant demand while the telecom market stabilizes. # Key themes and analysis in this Anchor Report include: - Detailed discussion on global hyperscale AI players' networking roadmaps for scale-up and scale-out networks, including optics, copper and networking switches - Demand-supply analysis on key segments, including optical transceivers and CPO - Upgrade YOFC to Buy and update financials for InnoLight, TFC, and T&S. # Research Analysts China Technology Bing Duan - NIHK bing.duan1@nomura.com +852 2252 2141 Ethan Zhang - NIHK ethan.zhang@nomura.com +852 2252 2157 cables could double over the next five years, reaching USD2.8bn by 2028E. Nvidia (NVDA US, Not rated) is an early adopter of DAC (Direct Attach Cable) in the Blackwell NVL72 system, and the company will likely continue to use copper cables in the upcoming Rubin platform in 2026. Other large AI customers such as Amazon AWS (AMZN US, Not rated), Meta (META US, Not rated) and Microsoft (MSFT US, Not rated) have also been using copper cables in their AIDC projects. In the copper cable supply chain, the key players include Amphenol (APH US, Not rated), Credo (CRDO US, Not rated), as well as Bizlink (3665 TT, Buy, covered by Kenny Chen), and Woer (002130 CH, Not rated), which are already in the global AI customers' copper cable supply chain. Meanwhile, we note an integration trend, as optical communications players such as InnoLight (300308 CH, Buy) and Eoptolink (300502 CH, Not rated) have been trying to enter this market. # Al switches: CPO switch penetration has started to increase; OCS has become a popular trend; "whitebox" products continue to gain traction Ethernet and ilfiniband (IB) are the two major network protocols in today's AI data centers, which are competing with each other to provide higher speed and lower latency in large AI training clusters. Thanks to improved performance, lower costs and lower power consumption, we think Broadcom (AVGO US, Not rated) will likely continue to push for CPO adoption based on its TomaHawk (TH) 6 platform. Meanwhile, Nvidia's Quantum-X (IB) and Spectrum-X (Ethernet) switches may also launch CPO versions, and we think Nvidia's GTC 2026 event in March 2026 could shed more light on these products. Moreover, we note that Google's OCS (optical circuit switch) is gaining popularity thanks to its strong performance in largeTPU training clusters, while other players such as Lumentum (LITE US, Not rated) and Coherent (COHR US, Not rated) have been proactively engaged in the market. We also note the growing trend of using "whitebox" switch products by global CSPs, in order to reduce cost and increase flexibility. In the meantime, this trend will likely have a negative impact on the margin trends of branded switch players, which have their own IP and software integrated in their hardware products. # Stocks for action: Upgrade YOFC to Buy on better AIDC demand and stabilized telecom outlook; accumulate InnoLight as the key beneficiary of 1.6T and SiPh migration, and TFC and T&S communication as potential CPO plays We believe China's leading optical component/transceiver companies will continue to benefit from the global AIDC investment upcycle, and the speed upgrade (800G to 1.6T), as well as technology migration (from pluggable to SiPh to CPO). We maintain our Buy rating on InnoLight (300308 CH), our sector top pick, as we think the global No.1 DC transceiver supplier (source: LightCounting) will benefit from the 1.6T transceiver and SiPh product upcycle in 2026-2027F, and its valuation looks attractive to us. We also like TFC (300394 CH, Buy) as a beneficiary of Nvidia's demand for high-end optical transceiver products and as a potential CPO beneficiary. In addition, we like Shenzhen T&S Communications (300570 CH, Buy) as a core supplier to Corning (GLW US, Not rated), a key player in US AIDCs, which provides scale-up and scale-out optical connections. We upgrade YOFC (6869 HK) to Buy from Neutral, as we think the company's AIDC business (including Hollow Core Fiber, Active Optical Cable, Active Electronic Cable, and MPO) expansion would accelerate in the global market, while China's telecom network outlook will likely stabilize. Fig.1: Stocks for action <table><tr><td>Company</td><td>Ticker</td><td>Rating</td><td>Mkt Cap (USD mn)</td><td>Avg. TO (USD mn)</td><td>Target Price (LC)</td><td>Price As of 8 Jan</td><td>Upside (%)</td><td>Company Description</td><td>NV's revenue exposure % in FY25F</td><td>Other global AI tech players' revenue exposure % in FY25F</td><td></td></tr><tr><td>Zhongji InnoLight</td><td>300308 CH</td><td>Buy</td><td>95,604</td><td>2,547</td><td>799.00 ↑</td><td>595.45</td><td>34.2%</td><td>Global No.1 DC optical transceiver supplier</td><td>25%~30%</td><td>50%~60%</td><td></td></tr><tr><td>YOFC</td><td>6869 HK</td><td>Buy</td><td>†</td><td>2,727</td><td>142</td><td>64.50 ↑</td><td>50.40</td><td>28.0%</td><td>One of China's leading optical fiber makersHigh-end optical component maker (optical engine, FAU)</td><td>N.A</td><td>10~20%</td></tr><tr><td>Suzhou TFC</td><td>300394 CH</td><td>Buy</td><td></td><td>21,742</td><td>1,026</td><td>243.00 ↑</td><td>195.25</td><td>24.5%</td><td rowspan="2">to Corning High-end MPO product maker, and key partner</td><td>50%~60%</td><td>10~20%</td></tr><tr><td>T&S Communication</td><td>300570 CH</td><td>Buy</td><td></td><td>3,774</td><td>260</td><td>148.00 ↓</td><td>115.99</td><td>27.6%</td><td>0~10%</td><td>60%~70%</td></tr></table> Source: Bloomberg Finance L.P., Nomura estimates # AI networking technology roadmap – what to expect in 2026 (and 2027 onwards) We believe the AI infrastructure upcycle for global hyperscale AI cloud companies will continue in 2026-27F, thanks to the intense competition for large language model (LLM) training and inference. GPU and ASIC players are striving to gain more market share in the LLM era, while their shortened technology upgrade cycles will accelerate product and technology innovation in the AI networking segment, in our view. AI data centers need to expand their computing and interconnect capabilities at different levels to meet the ever-expanding demand from LLM training and inference. There are three different levels for expanding computing clusters: scale-up, scale-out, and scale-across. Scale-up: Scale-up network is connecting multiple GPUs into a single compute node. Scale-up focuses on within-rack interconnection, requiring extremely high bandwidth and low latency connections. The challenges of scale-up include a dramatic increase in power consumption and the trade-off between bandwidth, speed, and transmission distance. Currently, intra-rack systems primarily utilize copper interconnects (NVLink as an example). The bottleneck with copper lies in the limited effective transmission distance of copper cables at high speeds (e.g., 448G), which is less than 10 meters. However, the product roadmaps of global AI leaders indicate ever-increasing size of scale-up networks and GPU clusters, which will likely exceed copper's limit, and new solutions, i.e., optical communications will need to be introduced, in our view. By integrating optical I/O (e.g., silicon photonic transceivers) within the chip package, chip beachfront density can be significantly improved, breaking through the physical limits of electrical I/O, in our view. Scale-out: Scale-out involves building large data center AI clusters (typically within the same server room/campus) by adding more server nodes and interconnecting them via high-speed networks. Scale-out focuses on rack/cluster networking, requiring high bandwidth density switching architectures, and low-latency communication. For scale-up networks, optical interconnects are the mainstream solutions, with 400G/800G/1.6T optical transceivers connecting servers via ToR (Top-of-Rack) switches, or extending the reach through spine / core switches. As AI clusters expand and optical / switching chips capacities grow rapidly, the power consumption and costs also increase dramatically. To deal with this challenge, co-packaged optics (CPOs) has come into focus for the scale-out network. By directly integrating the optical engine with the switching ASIC, the total bandwidth and port density of the switch can be significantly increased, while reducing power consumption per bit. Scale-across (inter-data center expansion): Scale-across interconnects multiple data centers scattered across different regions into a larger-scale AI factory, enabling them to collaboratively run a single AI task. Scale-across requires wide-area/campus optical interconnects, requiring ultra-high bandwidth and extremely low transmission latency jitter over long distances, such as hollow-core fiber and coherent optical transceivers. Fig. 2: AIDC networking architecture: scale-up vs scale-out vs scale-across Source: CUBE Research, Nomura research # Scale-up standards: NVLink vs UALink vs SUE In the scale-up camp, the major AI tech players have self-developed or customized interconnect standards/protocols, such as Nvidia's NVLink/NVSwitch, Broadcom's SUE, AWS's NeuronLink, Huawei's UBSwitch, and Alibaba Cloud's ALink; in addition, there is UALink, which is being developed by a group of companies and organizations, targeting at providing industry open standard. In this environment of increased competition, we note that key players were striving to take a leading role by expanding their ecosystems. For instance, Nvidia is opening up its proprietary NVLink protocol as "NVLink Fusion" and authorize third-party vendors to build scale-up solutions based on "NVLink Fusion". Fig. 3: Comparisons among mainstream scale-up links <table><tr><td>Standards</td><td>RoCE v2</td><td>UEC scale-up</td><td>SUE</td><td>NueronLink</td><td>NVLink</td><td>PCIe</td><td>UALink</td><td>UB</td><td>OISA 1.1</td></tr><tr><td>Organization</td><td>Ethernet</td><td>Ultra Ethernet Consortium</td><td>Broadcom and others</td><td>AWS</td><td>NVIDIA</td><td>CXL</td><td>UALink Alliance</td><td>Huawei</td><td>China mobile and others</td></tr><tr><td>Use scenario</td><td>Server/GPU interconnection; Symetric interconnection</td><td>Server/GPU interconnection; Heterogeneous interconnection</td><td>Server/GPU interconnection; Heterogeneous interconnection</td><td>Server/GPU interconnection; Heterogeneous interconnection</td><td>GPU interconnection; Symetric interconnection</td><td>Server/GPU interconnection; Symetric/Heterogeneous ous interconnection</td><td>Server/GPU interconnection; Heterogeneous ous interconnection</td><td>Server/GPU interconnection; Symetric/Heterogeneous interconnection</td><td>GPU interconnection; Symetric interconnection</td></tr><tr><td>Memory semantics</td><td>Not support</td><td>N/A</td><td>Support</td><td>Support</td><td>Support</td><td>Support</td><td>Support</td><td>Support</td><td>Support</td></tr><tr><td>Flow control management</td><td>PFC+ECN</td><td>LLR+CBFC</td><td>Based on Credit</td><td>N/A</td><td>Based on Credit</td><td>Based on Credit</td><td>Based on Credit</td><td>N/A</td><td>CBFC/PFC</td></tr><tr><td>SerDes bandwidth</td><td>112GT/s; 224GT/s</td><td>112GT/s; 224GT/s</td><td>112GT/s; 224GT/s</td><td>112GT/s; 224GT/s</td><td>112GT/s; 224GT/s</td><td>32GT/s; 64GT/s</td><td>112GT/s; 224GT/s</td><td>N/A</td><td>112GT/s</td></tr><tr><td>Delay</td><td>Microseconds</td><td>Hundreds of ns</td><td>Hundreds of ns</td><td>Microseconds</td><td>Microseconds</td><td>Hundreds of ns</td><td>Hundreds of ns</td><td>Microseconds</td><td>Hundreds of ns</td></tr><tr><td>Energy efficiency</td><td><100pj/bit</td><td><100pj/bit</td><td>N/A</td><td>N/A</td><td><10pj/bit</td><td><10pj/bit</td><td><10pj/bit</td><td>N/A</td><td><10pj/bit</td></tr><tr><td>Interconnection distance</td><td>Within DC</td><td>Within DC</td><td>Multi-racks</td><td>Multi-racks</td><td>Multi-racks</td><td>Within DC</td><td>Multi-racks</td><td>Multi-racks</td><td>Within racks</td></tr></table> Source: Company data, Nomura research # Nvidia's NVLink / NVLink Fusion NVLink is an interconnect technology purposely designed by Nvidia for high-performance computing, aiming to enable high-speed data exchange between GPUs or between a GPU and a CPU. Since its debut in 2016, NVLink has been upgraded to its sixth generation now, to meet growing computing demand. NVLink has become an advanced technology solution and a key pillar of Nvidia's dominance in the AI chip industry thanks to the low latency, high efficiency, and high bandwidth. According to Nvidia's roadmap, NVLink 6 will be launched in 2026, achieving a communication rate of 3.6TB/s and interconnecting 576 GPUs. NVLink 7 will be launched in 2027, and NVLink 8 in 2028, to further improve bandwidth, reduce latency, and optimize the flexibility of interconnect topologies. Additionally, Nvidia launched NVLink Fusion at Computex 2025, aiming to open the NVLink ecosystem to third-party CPU and accelerator manufacturers. This involves releasing IP and hardware to drive interoperability between third-party designs and Nvidia chips. It allows custom chips (including CPUs and XPU's) to integrate with Nvidia's NVLink Scale-up network technology and rack-scale expansion architecture, enabling semi-custom AI infrastructure deployments. Current NVLink Fusion partners include Alchip, Astera Labs, Marvell, MediaTek, Fujitsu, Qualcomm, Cadence, and Synopsys. Fig.4: NVLink historical roadmap <table><tr><td>Version</td><td>Year</td><td>GPU Architecture</td><td>Bandwidth/link (Bi., GB/s)</td><td>No. of link</td><td>Total Bandwidth (GB/s)</td><td>Highlight</td></tr><tr><td>NVLink 1.0</td><td>2016</td><td>Pascal</td><td>40</td><td>4</td><td>160</td><td>Several times faster than PCIe 3.0 x16 (~32 GB/s)</td></tr><tr><td>NVLink 2.0</td><td>2017</td><td>Volta</td><td>50</td><td>6</td><td>300</td><td>Introduced NVSwitch to support fully connected topology</td></tr><tr><td>NVLink 3.0</td><td>2020</td><td>Ampere</td><td>50</td><td>12</td><td>600</td><td>Met the training requirements of LLMs</td></tr><tr><td>NVLink 4.0</td><td>2022</td><td>Hopper</td><td>100</td><td>18</td><td>900</td><td>PAM4 encoding is used</td></tr><tr><td>NVLink 5.0</td><td>2024</td><td>Blackwell</td><td>200</td><td>18</td><td>1800</td><td>Support ultra-large-scale AI clusters</td></tr><tr><td>NVLink Fusion</td><td>2025</td><td>Blackwell</td><td>200</td><td>18</td><td>1800</td><td>Open the ecosystem to third parties and support heterogeneous chip</td></tr><tr><td>NVLink 6.0</td><td>2026</td><td>Rubin</td><td>400</td><td>36</td><td>3600</td><td>SerDes channel speed upgraded to 400G</td></tr><tr><td>NVLink 7.0</td><td>2027</td><td>Rubin Ultra</td><td>400</td><td>36</td><td>3600</td><td>?</td></tr></table> Source: Company data, Nomura research # Ethernet for scale-up: UALink and SUE Ethernet was initially developed by Robert Metcalfe in 1973 while he worked in the Xerox Palo Alto Research Center, and is currently the most widely-used network interconnection protocol and the de facto industry standard (maintained by IEEE organization). The Ethernet switches and optical transceivers have a mature industry supply chain, which has helped to build large-scale on-premise DCs and cloud computing networks over the past few decades, and is now expanding to AI clusters. While Ethernet was primarily developed for scale-out network, UALink has been developed as an open protocol for scale-up network, which is compatible with Ethernet, to compete with Nvidia's NVLink. UALink is an open industry interconnect standard released by the UALink Consortium including AMD, AWS, Broadcom, Cisco, and Google. As a scalable AI Fabric, UALink can be deployed for AI training and inference solutions to support a wide range of AI models. For LLM training, UALink will enable scalable domains to reach hundreds of GPUs. The UALink 1.0 specification launched in April 2025 supports a data transfer rate of up to 200GTb/s per channel, enabling 200G extended connectivity per channel for up to 1,024 accelerators within a computing pod, with latency below 1 microsecond. It features low power consumption, strong Ethernet compatibility, and high security and manageability. Currently, there is no commercialized solutions based on UALink protocol, but the key enablers such as AMD, Intel, Astera Labs, Marvell, and Synopsys are actively developing products and solutions. For example, AMD's AI all-in-one machine, "Helios," uses the UALoE (UALink over Ethernet) solution, which utilizes the Ethernet physical layer and switching chips at the underlying level to carry UALink protocol data. To provide an optimized solution for AI scale-up network while also remaining compatible with Ethernet protocol, Broadcom has launched the Scale-up Ethernet (SUE) architecture, emphasizing openness, compatibility, and low latency performance. Unlike the UALink which is still in the commercialization process, Broadcom has already launched commercial products based on SUE (to compete with NVLink), which include the Tomahawk 6 series chips launched in June 2025 and the Tomahawk Ultra chip launched in July 2025. Moreover, the Tomahawk 6 offers a unified solution that supports both scale-out and scale-up. # PCIe for scale-up PCIe is a high-speed standard used to connect hardware components inside computers, and commonly used to connect GPUs with CPUs, Wifi and network cards. PCIe (Peripheral Component Interconnect Express) was initially developed by Intel and a group of other companies, and now it's an industry protocol managed by the PCI-SIG organization. PCIe-7.0 was released in 2025. PCIe 7.0 bandwidth doubled compared to PCIe 6.0, increasing the transmission rate per lane to 128GB/s, with a bidirectional bandwidth of up to 512GB/s for x16 lanes and a single-lane bandwidth of approximately 16GB/s. Notably, PCIe 7.0 is developed by PCI-SIG organization to introduce optical connectivity to enhance long-distance transmission performance. Fig. 5: PCIe link performance <table><tr><td rowspan="2">Version</td><td rowspan="2">Year</td><td rowspan="2">Line code</td><td rowspan="2">Transfer rate (per lane)</td><td colspan="5">Throughput (GB/s)</td></tr><tr><td>x1</td><td>x2</td><td>x4</td><td>x8</td><td>x16</td></tr><tr><td>PCIe 1.0</td><td>2003</td><td rowspan="5">NRZ</td><td>2.5 GT/s</td><td>0.25</td><td>0.5</td><td>1</td><td>2</td><td>4</td></tr><tr><td>PCIe 2.0</td><td>2007</td><td>5.0 GT/s</td><td>0.5</td><td>1</td><td>2</td><td>4</td><td>8</td></tr><tr><td>PCIe 3.0</td><td>2010</td><td>8.0 GT/s</td><td>0.985</td><td>1.969</td><td>3.938</td><td>7.877</td><td>15.754</td></tr><tr><td>PCIe 4.0</td><td>2017</td><td>16.0 GT/s</td><td>1.969</td><td>3.938</td><td>7.877</td><td>15.754</td><td>31.508</td></tr><tr><td>PCIe 5.0</td><td>2019</td><td>32.0 GT/s</td><td>3.938</td><td>7.877</td><td>15.125</td><td>31.508</td><td>63.015</td></tr><tr><td>PCIe 6.0</td><td>2022</td><td rowspan="3">PAM-4 FEC</td><td>64.0 GT/s</td><td>7.563</td><td>15.125</td><td>30.25</td><td>60.5</td><td>121</td></tr><tr><td>PCIe 7.0</td><td>2025</td><td>128.0 GT/s</td><td>15.125</td><td>30.25</td><td>60.5</td><td>121</td><td>242</td></tr><tr><td>PCIe 8.0</td><td>2028(expected)</td><td>256.0 GT/s</td><td>30.25</td><td>60.5</td><td>121</td><td>242</td><td>484</td></tr></table> Source: Nomura research # Scale-out standards: Infiniband vs Ethernet vs OCS In the scale-out network, RDMA is a core network technology that bypasses the CPU and performs data transfer directly at the memory level, allowing computers to directly access the memory of remote computers. RDMA technology currently has two main implementation methods: InfiniBand and RoCE (RDMA over Converged Ethernet). InfiniBand is primarily supported by Nvidia, and has been widely adopted in large-scale AI clusters, as it supports submicrosecond latency and intrinsic lossless networking (based on a credit signal control mechanism), which can significantly reduce the communication latency of GPU clusters in AI training. Meanwhile, Ethernet as an industry standard communication protocol has a strong competitive edge, thanks to strong compatibility, an extensive ecosystem, and relatively low cost, although it is not purposely built for AI. To compete with Nvidia's InfiniBand network in the AI networks, the UEC (Ultra Ethernet Consortium) was founded in July 2023. The UEC protocol supports multipath transmission and microsecond latency, which is suitable for AI training scenarios. The transport layer defined by UEC completely overhauls current RoCE technology, optimizing congestion control and multipath transmission. It is a key technology for solving congestion and uneven load problems caused by large-scale clusters, in our view. In June 2025, UEC released UEC Specification 1.0, providing high-performance, scalable, and interoperable solutions for all layers of the network stack (including NICs, switches, fiber optics, and cabling), thereby enabling seamless multi-vendor integration. Fig. 6: Infiniband and Ethernet market size forecasts Source: 650 Group, Nomura research # Scale-across networks: building large AI factory through DCI In August 2025, Nvidia proposed the scale-across network, integrating distributed data centers into gigawatt-level AI super factories through cross-regional interconnection, becoming the third path for expanding AI computing power. Meanwhile, Nvidia launched the Spectrum-XGS Ethernet solution, which combines photonics technology and advanced routing algorithms to automatically adjust congestion control based on the distance between data centers, achieving near-local communication performance in geographically distributed clusters. At the scale-across level, optical communication demonstrates a clear advantage, for example, using coherent DWDM optical transmission technology to provide 800G or even 1.6T single-wavelength capacity over distances ranging from tens to hundreds of kilometers. Therefore, we believe that long-distance optical modules (coherent optical modules), hollow-core optical fibers, and other optical communication equipment and devices will likely play important roles in the scale-across domain. # Global CSPs' networking roadmaps # Nvidia's scale-up network The GB200 NVL72 utilizes NVLink 5.0 and PCIe 6.0 protocols for its scale-up network. The NVLink switch chip supports 72 ports, each with a specification of x2 200GB, and the entire chip supports a maximum bidirectional bandwidth of 7.2TB/s. The GB200 NVL72 contains a total of nine NVLink switch trays, each configured with two NVLink switch chips, providing an aggregated bandwidth of 14.4TB/s. The GB200 NVL72 uses a custom-designed copper cable cartridge to achieve NVLink connectivity between the nine NV switches and the 18 compute nodes within the rack. The number of copper connection cables is 5,184. Looking ahead to the next generation of the Rubin series, the Nvidia Rubin NVL144 CPX uses midplane (PCB) to replace cable connections within the compute tray. The Rubin Ultra NVL576 uses an orthogonal backplane (PCB) instead of a copper cable backplane to connect the Compute tray and switch tray. It may utilize NVLink 6.0/7.0 technology (3600GB), CX9 network cards (1.6Tbps), and a Spectrum6 CPO switch (102T). Fig. 7: Nvidia server platform roadmap Source: Nvidia, Nomura research # Nvidia's scale-out network Regarding scale-out, the Quantum X800 switch and ConnectX-8 super network interface card (NIC), based on the 224G generation InfiniBand standard protocol, were first released at the GTC 2024. Meanwhile, the Ethernet-based Spectrum X800 switch and BlueField-3 NIC, released at the same time, still used the 112G generation Spectrum-4 switching chip and 400G BlueField-3 DPU products. According to Nvidia's 3QFY26 earnings call, Meta, Microsoft, Oracle and xAI were using Spectrum-X Ethernet switches to build gigawatt-scale AI factories. In March 2025, Nvidia announced the next-gen Vera Rubin platform, with significantly improved scale-out performance. The Rubin single-card network interface will be upgraded from CX-7 400G to CX-9 1.6T, while a 1.6T IB/Ethernet switch will be deployed to further improve cluster information transmission speed. Fig. 8: Roadmap for Nvidia scale-up and scale-out networking <table><tr><td></td><td>2024</td><td>2025</td><td colspan="2">2026</td></tr><tr><td></td><td>GB200 NVL72</td><td>GB300 NVL72</td><td>VR200 NVL144</td><td>VR200 NVL144 CPX</td></tr><tr><td colspan="5">Scale-up Networking</td></tr><tr><td>Accelerator</td><td>GB200</td><td>GB300</td><td>VR200</td><td>VR CPX</td></tr><tr><td>NVLink</td><td>NVLink 5.0</td><td>NVLink 5.0</td><td>NVLink 6.0</td><td>NVLink 6.0</td></tr><tr><td>NVLink speed (GB/s uni-di)</td><td>900</td><td>900</td><td>1,800</td><td>1,800</td></tr><tr><td>Number of NVLink links</td><td>18</td><td>18</td><td>18</td><td>18</td></tr><tr><td>Lanes per NVLink link</td><td>2</td><td>2</td><td>4</td><td>4</td></tr><tr><td>NVLink lane speed (Gb/s uni-di)</td><td>200G</td><td>200G</td><td>200G</td><td>200G</td></tr><tr><td>NVLink bandwidth (TB/s uni-di)</td><td>518</td><td>518</td><td>1037</td><td>1037</td></tr><tr><td>NVSwitch generation</td><td>NVSwitch 5.0</td><td>NVSwitch 5.0</td><td>NVSwitch 6.0</td><td>NVSwitch 6.0</td></tr><tr><td>Number of NVSwitches</td><td>18</td><td>18</td><td>36?</td><td>36?</td></tr><tr><td>NVSwitch ports</td><td>72</td><td>72</td><td>72</td><td>72</td></tr><tr><td>NVSwitch lanes per ports</td><td>2</td><td>2</td><td>4</td><td>4</td></tr><tr><td>NVSwitch speed per lane (Gb/s uni-di)</td><td>200G</td><td>200G</td><td>200G</td><td>200G</td></tr><tr><td colspan="5">Scale-out Networking</td></tr><tr><td>NIC generation</td><td>CX-7 400G</td><td>CX-8 800G</td><td>CX-9 1.6T</td><td>CX-9 1.6T</td></tr><tr><td>Number of NIC per compute tray</td><td>4</td><td>4</td><td>8</td><td>8</td></tr><tr><td>Total number of NICs</td><td>72</td><td>72</td><td>144</td><td>144</td></tr><tr><td>NIC bandwidth (TB/s uni-di)</td><td>28.8</td><td>57.6</td><td>115.2</td><td>115.2</td></tr><tr><td>Front-end NIC</td><td>Bluefield-3</td><td>Bluefield-3</td><td>Bluefield-4</td><td>Bluefield-4</td></tr><tr><td>Scale-out switch</td><td colspan="2">Quantum-X800 144x800G/Spectrum-X 64x800G</td><td colspan="2">x1600 IB/Ethernet switch</td></tr><tr><td>Transceiver</td><td colspan="2">800G DR4, 1.6T DR8</td><td colspan="2">1.6T DR4?</td></tr><tr><td>Laser</td><td colspan="2">EML, SiPh</td><td colspan="2">SiPh?</td></tr></table> Source: Semianalysis, Nomura research Fig. 9: Nvidia - network solutions supply chains <table><tr><td>NVIDIA</td><td>Networking</td><td>Solution</td><td colspan="2">Supplier</td></tr><tr><td rowspan="2">GB200</td><td>Scale-up</td><td>400G DAC</td><td>Amphenol</td><td>Copper cable: Woer, Shenyu, Xinya, Zhaolong</td></tr><tr><td>Scale-out</td><td>800G optical transceiver</td><td colspan="2">Innolight, Finisar, Eoptolink, TFC, Fabrinet</td></tr><tr><td rowspan="2">GB300</td><td>Scale-up</td><td>800G DAC?</td><td>Amphenol</td><td>Copper cable: Woer, Shenyu, Xinya, Zhaolong?</td></tr><tr><td>Scale-out</td><td>800G/1.6T optical transceiver</td><td colspan="2">Innolight, Finisar, Eoptolink, TFC, Fabrinet</td></tr><tr><td rowspan="2">VR200</td><td>Scale-up</td><td>midplane (PCB)/1.6T AEC?</td><td>Credo?</td><td>Copper cable: Woer?, Xinya?</td></tr><tr><td>Scale-out</td><td>800G/1.6T optical transceiver</td><td colspan="2">Innolight, Finisar, Eoptolink, TFC, Fabrinet</td></tr><tr><td rowspan="2">Rubin Ultra</td><td>Scale-up</td><td>backplane (PCB)/1.6T AEC?</td><td>VGT?, Credo?</td><td>Copper cable: Woer?, Xinya?</td></tr><tr><td>Scale-out</td><td>1.6T optical transceiver</td><td colspan="2">Innolight, Finisar, Eoptolink, TFC, Fabrinet</td></tr></table> Source: Company data, Nomura research # Google scale-up and scale-out: ICI (Interchip Interconnect) and OCS (Optical Circuit Switch) Optical interconnect technology is one of the core technologies that differentiates Google' s AI network, which is purposely built to maximize the performance of its TPU platform. The evolution of optical interconnect in Google TPU systems has been a "step-by-step improvement", from no optoelectronics to an all-optical network, with continuous breakthroughs in bandwidth and scale. In 2018, TPU v2 was released and optical transceivers were not yet introduced. With TPU v3, optical interconnect technology was introduced for the first time, using 400G active optical cable (AOC), each optical channel with a baud rate of 50G. Then, the optical transceiver was upgraded to 400G OSFP for TPU v4, while optical circuit switches (OCS) were first introduced on TPU v4. The optical transceivers for TPU v5p were updated to 800G OSFP, with improved OCS technology. In 2025, TPU v7 (Ironwood) was launched, which adopts 800G OSFP optical transceivers, increases the optical channel baud rate to 200G, and achieves 1.77PB of direct addressing shared high bandwidth memory (HBM), which can efficiently support dense and sparse models. # Celebrating 10 Years of TPU evolution Fig. 10: Roadmap for Google's TPU Source: Google, Nomura research Fig. 11: Specification summary of TPUs <table><tr><td></td><td>TPU</td><td>TPU chips per superpod</td><td>Topology</td><td>ICI bandwidth per TPU chip</td><td>ICI optical transceiver</td><td>Optical lane rate</td><td>OCS</td></tr><tr><td>2018</td><td>v2</td><td>256</td><td>2D Torus</td><td>800GB/s</td><td>None</td><td>NA</td><td>None</td></tr><tr><td>2020</td><td>v3</td><td>1024</td><td>2D Torus</td><td>800GB/s</td><td>400G AOC</td><td>50G</td><td>None</td></tr><tr><td>2022</td><td>v4</td><td>4096</td><td>3D Torus</td><td>600GB/s</td><td>400G OSFP</td><td>50G</td><td>OCS</td></tr><tr><td>2023</td><td>v5p</td><td>8960</td><td>3D Torus</td><td>1200GB/s</td><td>800G OSFP</td><td>100G</td><td>OCS</td></tr><tr><td>2025</td><td>v7</td><td>9216</td><td>3D Torus</td><td>1200GB/s</td><td>800G OSFP</td><td>200G</td><td>OCS</td></tr></table> Source: Google, Nomura research Google's 3D torus topology and cube building block design is an architecture that significantly improves system performance for the high-density chip interconnect in Al data centers, according to management. Within the compute tray, the TPUs are connected via copper Inter-Chip Interconnect (ICI), and may be upgraded to AECs in the future. Different trays are connected via optical interconnects into a rack, and TPU v7 uses 800G or 1.6T optical transceivers for interconnecting the cubes. For the TPU v7 cluster, four Ironwood chips are integrated onto a single PCBA motherboard, and 16 PCBA motherboards are stacked like trays to form an Ironwood TPU rack containing 64 chips. Inside the rack, Google uses a 4x4x4 3D Torus network topology to form a logical computing unit. To achieve larger-scale expansion, Google employs its proprietary ICI technology, using a mix of PCB traces, copper cabling, and fiber optic links to connect multiple racks into a Superpod by intergenerating different cubes (racks). The 64-TPU v7 rack requires 80 copper cables, 64 PCBs, and 96 optical transceivers (assumes using 1.6T optical transceivers), with ratios of 1:1.25, 1:1, and 1:1.5, respectively, according to SemiAnalysis. A 9,216 TPU v7 cluster requires 11,520 copper cables and 13,824 1.6T optical transceivers, calculated based on the above-stated ratios. In addition, the cluster has 13,824 ports (96 per rack * 144 racks) and needs 48 OCS switches with 288 ports for a scale-out network. Fig. 12: TPU v7 3D Torus connection solutions <table><tr><td colspan="3">TPU v7 3D Torus Connection Solutions</td></tr><tr><td></td><td>Per TPU</td><td>Rack Total</td></tr><tr><td colspan="3">8 Interior TPUs</td></tr><tr><td>Copper cables</td><td>4</td><td>32</td></tr><tr><td>PCB Traces</td><td>2</td><td>16</td></tr><tr><td>Optical Transceivers</td><td>0</td><td>0</td></tr><tr><td colspan="3">8 Corner TPUs</td></tr><tr><td>Copper cables</td><td>1</td><td>8</td></tr><tr><td>PCB Traces</td><td>2</td><td>16</td></tr><tr><td>Optical Transceivers</td><td>3</td><td>24</td></tr><tr><td colspan="3">24 Edge TPUs</td></tr><tr><td>Copper cables</td><td>2</td><td>48</td></tr><tr><td>PCB Traces</td><td>2</td><td>48</td></tr><tr><td>Optical Transceivers</td><td>2</td><td>48</td></tr><tr><td colspan="3">24 Face TPUs</td></tr><tr><td>Copper cables</td><td>3</td><td>72</td></tr><tr><td>PCB Traces</td><td>2</td><td>48</td></tr><tr><td>Optical Transceivers</td><td>1</td><td>24</td></tr><tr><td colspan="3">Total for 64 TPU Rack</td></tr><tr><td>Copper cables</td><td>1.25</td><td>80</td></tr><tr><td>PCB Traces</td><td>1.00</td><td>64</td></tr><tr><td>Optical Transceivers</td><td>1.50</td><td>96</td></tr></table> Source: SemiAnalysis, Nomura research Fig. 13: Google Ironwood scale-up network The Suite of Racks in a Superpod Source: Google, Nomura research # META's networking solutions: DSF and NSF Unlike Google, which has been using TPUs for AI workloads for a long time, Meta has been using general-purpose GPUs for its AIDC. Meanwhile, Meta has been proactively developing its own networking architecture to optimize the efficiencies of its AI clusters. At the 2024 OCP Summit, Meta released Disaggregated Scheduled Fabric (DSF), a scheduling architecture based on virtual output queues. The DSF solution is primarily a Fabric solution built on the Broadcom Jericho + Ramon suite, essentially a decoupled version of the previous chassis router. Through Cell switching + VoQ mechanisms, it achieves near-perfect load balancing, low latency jitter, and a congestion-free Fabric network. A similar solution in China is China Mobile's fully scheduled Ethernet GSE. In 2025, DSF was upgraded to a Level 2 topology, achieving Layer 3 networking and enabling non-blocking connections for 18,432 XPUs. This network architecture has been deployed at scale in Meta's 18K card cluster built on GB200, consisting of four zones, with each card providing 800G bandwidth (2*400G, 400G/plane). Furthermore, Meta introduced a Non-Scheduled Fabric (NSF) network architecture in 2025, which is essentially a Layer 3 CLOS network. Meta has built a cluster of 20,736 GB300 based on this network, using GB300 NVL72 as the access unit. Notably, Meta will use the 51.2T Nvidia Spectrum 4 switch (link), which is part of the end-to-end solution provided by Nvidia, including GPUs + NICs + Spectrum Ethernet switches. For Nvidia's rack level interconnections, Meta highlighted its next-generation custom GB300 rack in OCP 2025, internally called Clemente, similar to Nvidia's NVL36 architecture. Meta will deploy these Clemente racks along with a switch/power rack as well as three air assisted liquid cooling (AALC) racks. Each switch/power rack has three Minipack 3 switches (with Tomahawk 5 ASIC, produced by Celestica), acting as first-layer leaf switches. Fig. 14: Meta - Clemente GB300 rack Source:Meta,Nomura research The AI cluster based on Meta's self-developed ASIC uses hybrid interconnect architecture, strengthens the advantages of copper cables in the scale-up domain, and optimizes the deployment of optical transceivers in the scale-out domain, according to management. Furthermore, Meta has already started to explore the CPO technology. Scale-up: GPU interconnection is achieved through cable cables and rack-mounted switches. The core objectives are low latency ( $<1\mu s$ ) and high reliability. For example, by using Credo's ZeroFlap AEC product, Meta's AI cluster achieves 6 billion hours of link-free jitter and 100 million hours of MTBF, far exceeding the 10 million hours of optical transceivers. Moreover, this can be extended to a 3 million GPU cluster, according to the company. Scale-out: Meta uses pluggable optical transceivers (800G OSFP) and single-mode fiber (SMF) to connect different racks, and combines them with an aggregation switch layer to form a backend network, which achieves the goal of long-distance, high-bandwidth connections (supporting hundreds of thousands of GPU expansion). Although pluggable transceivers are still the mainstream solution, Meta has already tested the Broadcom Bailly 51.2T CPO switch, achieving over one million hours of trouble-free operation, demonstrating higher reliability than traditional pluggable optical transceivers. Meanwhile, by eliminating intermediate active components, link fluctuations are reduced, improving the stability of the GPU cluster, according to management. Specifically, Meta's Minerva rack, built on its self-developed MITA-T chip, includes 16 MITA-T computing units (each contains MITA-T chip and one CPU), two J3 network units for scale-out (each contains two J3 chips), and four TH5 network units for scale-up (each contains 32 800G ports). The rack primarily uses copper connections, with 800G optical transceivers used for cross-rack connections. Fig. 15: Meta - Minerva rack architecture Source:Meta,Nomura research Fig. 16: Meta - Minerva networking architecture Source:Meta,Nomura research # AWS's networking roadmap In 2024, along with the in-house designed Trainium2 (Trn2) chip, AWS also released its self-developed NeuroLink chip-to-chip interconnect technology. Through the NeuralLink-v3 network, Trn2 achieved faster inter-chip communication, enabling the formation of a 64-chip 3D torus topology, similar to the 3D cube design from Google's TPU network. The difference between the Trn2 topology and the TPU topology is that a TPU cube can connect to other TPU cubes on all six faces via optics, while Trn2 does not allow this. NeuronLink-v3 is divided into two types—intra-server and inter-server. The intra-server will connect the 16 chips within each Physical Server, while the inter-server NeuronLink will connect chips from different Physical Servers together for a total 64 chips cluster. Each Trainium2 and Trainium2-Ultra physical server occupies 18 rack units (RUs), consisting of one 2U CPU tray and eight 2U compute trays. This architecture connects the compute trays point-to-point via a passive copper backplane, forming a $4 \times 4$ 2D torus structure (intra-server) or a $4 \times 4 \times 4$ 3D toroidal structure (inter-server), reducing the latency and bandwidth loss associated with traditional switches. AWS recently demonstrated its in-house designed flagship ASIC chip, Trainium3 (Trn3) UltraServer with NeuronSwitch v1 and NeuronLink v4, as well as its next-generation AI chip, Trainium4, at the re:Invent 2025 event. A single Trn3 UltraServer integrates up to 144 Trn3 chips, providing a total of 20.7 TB HBM3E, 706 TB/s memory bandwidth, and a maximum computing power of 362 FP8 PFLOPS. Trn3 introduced NeuronSwitch v1 to connect 144 chips, doubling the inter-chip bandwidth compared to the Trn2 UltraServer. Additionally, the upgraded NeuronLink-v4 doubles its bandwidth to 2.5 TB/s. Moreover, AWS mentioned that Tranium4 will be 6x more powerful than Tranium3, and will simultaneously support NVLink Fusion and UALink ports. Meanwhile, NeuroLink-v5 may also be tweaked to be compatible with UALink 2.0. In addition, Astera Labs has custom-designed the Scorpio-X switch chip for AWS's Trainium rack to scale-up network connectivity. This chip can be programmed to become a NeuronLink switch chip, meeting AWS's need for high-performance transmission devices. Astera Labs launched a PCIe6-based scale-up switch chip in 2H25 to be used with AWS's Trn2.5 rack, and also released a switch chip that can run PCIe7/UALink 128G dual mode in 3Q25, which is planned to enter mass production in 2026 to be used with AWS's Trn3. Fig. 17: Trainium2 overview and comparisons <table><tr><td></td><td>Trn2</td><td>TPUv6e</td><td>GB200</td></tr><tr><td>Theoretical BF16 Dense TFLOP/s/chip</td><td>667</td><td>918</td><td>2500</td></tr><tr><td>HBM capacity (GB/chip)</td><td>96</td><td>32</td><td>192</td></tr><tr><td>HBM bandwidth (GB/chip)</td><td>2900</td><td>1640</td><td>8000</td></tr><tr><td>FLOP per HBM capacity (BF16 FLOP per Byte)</td><td>6947.9</td><td>28687.5</td><td>13020.8</td></tr><tr><td>Chip power (Watts)</td><td>500</td><td>600</td><td>1200</td></tr><tr><td>Cooling Technology</td><td>Air cooled</td><td>Air cooled</td><td>Direct to chip liquid</td></tr><tr><td colspan="4">Scale-up Networking</td></tr><tr><td>Scale-up technology</td><td>NeuronLink v3</td><td>TPU ICI</td><td>NVLink 5.0</td></tr><tr><td>Scale-up bandwidth (GB/chip uni-di)</td><td>512</td><td>448</td><td>900</td></tr><tr><td>Scale-up world size (# of chips)</td><td>16</td><td>256</td><td>72</td></tr><tr><td>Scale-up topology</td><td>4x4 2D Torus symmetric BW along all axis</td><td>16x16 2D Torus symmetric BW along all axis</td><td>18-rail optimized fat tree with NVLink SHARP in Network Redux</td></tr><tr><td># of scale-up neighbors per chip</td><td>4</td><td>4</td><td>Any to Any</td></tr><tr><td>Total HBM capacity per scale-up world</td><td>1536</td><td>8192</td><td>13824</td></tr><tr><td># of physical servers per scale-up world</td><td>1</td><td>32</td><td>18</td></tr><tr><td colspan="4">Scale-out Networking</td></tr><tr><td>Scale-out technology</td><td>EFA v3 (AWS Flavor of Ethernet)</td><td>Google Flavor of Ethernet)</td><td>IB/Ethernet</td></tr><tr><td>Scale-out bandwidth (GB/chip uni-di)</td><td>Up to 800</td><td>100</td><td>800</td></tr><tr><td>Scale-out topology</td><td>ToR fat tree</td><td>ToR fat tree</td><td>4-rail optimized fat tree</td></tr><tr><td>Scale-up to scale-out bandwidth ratio</td><td>5.12</td><td>35.84</td><td>9</td></tr></table> Source: Semianalysis, Nomura research Fig. 18: Trainium chip network scaling roadmap <table><tr><td colspan="6">Trainium Chip Network Scaling Roadmap</td></tr><tr><td></td><td>Trainium</td><td>Trn2 NL16 2D Torus</td><td>Trn2 NL32x2 3D Torus</td><td>Trn3</td><td>Trn4</td></tr><tr><td colspan="6">Scale-up Networking</td></tr><tr><td>Scale-up technology</td><td>NeuronLink v2 (PCIe Gen 4.0 based)</td><td>NeuronLink v3 (PCIe Gen 5.0 based)</td><td>NeuronLink v3 (PCIe Gen 5.0 based)</td><td>NeuronLink v4 (PCIe Gen 6.0 based)</td><td>NeuronLink v5?</td></tr><tr><td>Lanes per Trn</td><td></td><td>128</td><td>160</td><td>144</td><td>-</td></tr><tr><td>Lane speed (GB/s uni-di)</td><td></td><td>32</td><td>32</td><td>64</td><td>-</td></tr><tr><td>Scale-up bandwidth (TB/s uni-di)</td><td></td><td>0.5</td><td>0.6</td><td>1.2</td><td>-</td></tr><tr><td colspan="6">Scale-out Networking</td></tr><tr><td>Scale-out technology</td><td>EFA v2</td><td>EFA v3</td><td>EFA v3</td><td>EFA v4</td><td>EFA v5?</td></tr><tr><td>Max. scale-out bandwidth (GB/s/chip uni-di)</td><td>170</td><td>Up to 800</td><td>200</td><td>400</td><td>-</td></tr></table> Source: SemiAnalysis, Nomura research In terms of rack-level architecture, AWS offers new two types of Trn3 racks, Air Cooled Trainium3 NL32x2 Switched (Teton3 PDS) and Liquid Cooled Trainium3 NL72x2 Switched (Teton3 MAX). The Trainium3 NL32x2 Switched has 16 JBOG (just a bunch of GPUs) trays and two host CPU trays per rack. A full Trn3 NL32x2 Switched scale-up world is made up of two racks of 32 Trn3 chips each to build up a total world size of 64 Trn3. The key difference between the Trn2 NL32x2 3D Torus and the Trn3 NL32x2 Switched is the addition of the scale-up NeuronLink switch trays in the middle of the rack for the Trn3 NL32x2 Switched that enables the all-to-all switched network. In addition, Trn3 NL32x2 Switched will connect between two racks with cross rack AECs from one chip in Rack A directly to another chip in Rack B. The Trn3 NL72x2 Switched uses two racks to achieve a world size of 144 Trn3, and each rack houses 18 compute trays and 10 NeuronLink switch trays in the middle. With each compute tray housing four Trn3 and one Graviton4 CPU, there are a total of 144 Trn3 and 36 Graviton 4s across two racks making up the Trainium3 NL72x2 Switched world size. The backplane of it utilizes a hybrid of connectors from both TE and Amphenol. Fig. 19: Rack architecture of Trainium3 NL32x2 Switched Source: SemiAnalysis, Nomura research Fig. 20: Rack architecture of Trainium3 NL72x2 Switched Source: SemiAnalysis, Nomura research For scale-up network, the Trn3 Switched series racks employ a hybrid interconnection method: PCB (64 lanes) + backplane (80 lanes) + cross-rack AEC (16 lanes), achieving multi-level interconnection. For scale-out network, AWS constructs a three-layer network using 12.8T/25.6T/51.2T switches, scalable to clusters exceeding 520,000 GPUs, according to SemiAnalysis. Its self-developed EFA protocol replaces RoCE v2 and Infiniband, integrating hardware-accelerated congestion control and multi-path load balancing to reduce network latency and packet loss. For the use of copper cable, Trainium2 NL16 2D Torus relies on a single backplane and a relatively small number of AEC links, but the Trainium2/3 NL32x2 3D Torus increases the lane count and requires four NeuronLink backplanes along with $\sim 6,100$ copper cables to support the denser 3D Torus topology. Trainium3 NL32x2 Switched maintains a similar backplane count with $\sim 5,100$ copper cables, while Trainium3 NL72x2 Switched expands the scale-up domain further to 144 chips per server group from 64 in the Trainium3 NL32x2, driving the copper count to 11,520 units. Fig. 21: Trainium rack interconnection roadmap <table><tr><td colspan="7">Trainium Rack Interconnection Roadmap</td></tr><tr><td></td><td colspan="2">Trn2</td><td colspan="4">Trn3</td></tr><tr><td></td><td>Trn2 NL16 2D Torus</td><td>Trn2 NL32x2 3D Torus</td><td>Trn3 NL16 2D Torus</td><td>Trn3 NL32x2 3D Torus</td><td>Trn3 NL32x2 Swiched</td><td>Trn3 NL72x2 Swiched</td></tr><tr><td>Scale-up generation</td><td>NeuronLink v3</td><td>NeuronLink v3</td><td>NeuronLink v4</td><td>NeuronLink v4</td><td>NeuronLink v4</td><td>NeuronLink v4</td></tr><tr><td>Scale-up architecture</td><td>Mesh 2D Torus</td><td>Mesh 3D Torus</td><td>Mesh 2D Torus</td><td>Mesh 3D Torus</td><td>Switched</td><td>Switched</td></tr><tr><td>Scale-up world size</td><td>16</td><td>64</td><td>16</td><td>64</td><td>64</td><td>144</td></tr><tr><td>PCIe generation</td><td>PCIe Gen 5.0</td><td>PCIe Gen 5.0</td><td>PCIe Gen 6.0</td><td>PCIe Gen 6.0</td><td>PCIe Gen 6.0</td><td>PCIe Gen 6.0</td></tr><tr><td>Active lanes per Trn</td><td>128</td><td>160</td><td>144</td><td>144</td><td>144</td><td>144</td></tr><tr><td>No. of backplane lanes</td><td>96</td><td>96</td><td>64</td><td>64</td><td>64</td><td>64</td></tr><tr><td>No. of PCB</td><td>32</td><td>32</td><td>64</td><td>64</td><td>64</td><td>64</td></tr><tr><td>No. of AEC</td><td>0</td><td>32</td><td>16</td><td>16</td><td>16</td><td>16</td></tr><tr><td>Backplane redundant lanes</td><td>-</td><td>-</td><td>16</td><td>16</td><td>16</td><td>16</td></tr><tr><td>Total installed lanes</td><td>128</td><td>160</td><td>160</td><td>160</td><td>160</td><td>160</td></tr></table> Source: SemiAnalysis, Nomura research # Huawei's networking roadmap Huawei is targeting to provide a unified bus protocol called UB-Mesh, extending the concept of a local bus to the data center level. In terms of topology design, UB-Mesh adopts a hierarchical, localized, multi-dimensional, fully interconnected approach. UB-Mesh can help to achieve complete connectivity between any nodes in different dimensions. Compared to traditional CLOS, UB-Mesh exhibits sublinear cost growth with scale, enabling it to support a hundredfold increase in bandwidth while avoiding a hundredfold increase in cost, according to management. Fig. 22: Huawei UB-Mesh network topology Source: Huawei, Nomura research Based on UB network, Huawei launched CloudMatrix384, an AI supernode, in April 2025. CloudMatrix384 integrates 384 Ascend 910C NPUs and 192 Kunpeng CPUs, interconnected via an ultra-high bandwidth and low-latency network called unified bus (UB). UB network enables direct all-to-all data exchange across all compute and memory components, effectively addressing the scalability and efficiency challenges that are common in traditional data center architectures. However, this architecture still faces certain challenges in deployment. The single system adopts all-optical interconnect solution and integrates 6,912 optical transceivers, which not only increases pressure on power consumption, but also increases the probability of system failure, according to management. Fig. 23: Scale-up and scale-out network architecture for CloudMatrix384 Source: "Serving Large Language Models on Huawei CloudMatrix 384", Nomura research At Huawei Connect 2025, Huawei launched new products based on UB and supernode architectures, including the Atlas 950 fully liquid-cooled data center supernode, the Atlas 850 and Atlas 860 enterprise-grade air-cooled supernode servers, the A