China has recently unveiled its latest supercomputer featuring the Sunway SW26010 Pro CPU, a homegrown processor designed to enhance the country’s supercomputing capabilities and reduce reliance on foreign technology. The CPU boasts 384 cores and an impressive performance of over 13 trillion floating-point operations per second (TFLOPS). However, experts note that the processor faces challenges in cache and memory performance.
Revealed at the SC23 conference by the National Supercomputing Center in Wuxi, the Sunway SW26010 Pro CPU is based on a proprietary 64-bit RISC instruction set. It consists of six core groups (CG) and a protocol processing unit (PPU).
Each CG comprises 64 compute processing elements (CPEs) with a 512-bit vector engine and a 256 KB scratchpad cache. Additionally, there is one management processing element (MPE) with a scalar engine and a 256 KB L2 cache. The CG also features a 128-bit DDR4-3200 memory interface and 16 GB of DDR4 memory.
This CPU represents an upgrade from the Sunway SW26010, previously used in the Sunway TaihuLight supercomputer, the world’s fastest in 2016 and 2017. Improvements include enhanced clock speed, instruction set, and memory bandwidth, resulting in a four-fold increase in FP64 performance. The Sunway SW26010 Pro CPU achieves a peak FP64 performance of 13.8 TFLOPS, surpassing competitors like AMD’s 96-core EPYC 9654 CPU.
However, the Sunway SW26010 Pro CPU has limitations, particularly in cache and memory hierarchy. The CPEs’ scratchpad cache may need expansion to accommodate data required by the vector engine, and the absence of a proper L2 cache necessitates frequent data fetching from the main memory. The memory subsystem is considered insufficient, posing scalability and efficiency challenges for the supercomputer.
While the Sunway SW26010 Pro CPU signifies a significant achievement for China’s supercomputing industry, showcasing innovation and ambition, it also underscores existing gaps in cache and memory design. These elements are crucial for optimal performance and energy efficiency.
While powerful, the CPU’s limitations highlight the need for further advancements to secure China’s supercomputing supremacy, particularly in addressing bottlenecks that may affect scalability and efficiency.