Nvidia's Spectrum-X to Power AI Supercomputer with 200,000 GPUs

xAI revealed its Colossus supercomputer, powered by Nvidia’s Spectrum-X Ethernet for AI-focused unmatched speed and scalability.

On October of last year, xAI revealed its Colossus supercomputer, powered by Nvidia’s Spectrum-X Ethernet for AI-focused unmatched speed and scalability.

This supercomputer, supplied with 100,000 Nvidia Hopper graphic processing units (GPU), has set new benchmarks in AI computing infrastructure to seamless training of AI while reducing networking bottlenecks. With plans to expand to 200,000 H100 and H200 GPUs, Colossus is on the way to becoming one of the most powerful AI supercomputers in the world.

Nvidia’s Colossus is based on the core of the Nvidia Spectrum-X platform, which features the Spectrum-X Ethernet switch for port speeds up to 800 Gb/s. This high-speed platform will join forces with Nvidia’s Spectrum-X Ethernet platform and Nvidia’s BlueField-3 SuperNICs for higher data throughput, nearly eliminating packet loss—critical for seamless AI training at such an enormous scale.

Gilad Shainer, senior vice president of Networking at Nvidia, continued to comment on what the platform can enable: “The Nvidia Spectrum-X Ethernet networking platform is designed to provide innovators like xAI with faster processing, analysis, and execution of AI workloads, accelerating development, deployment, and market readiness of AI solutions.”

High-Performance Networking Just Got a Whole Lot Better

Traditional Ethernet struggles to support the connection of thousands of GPUs without network congestion or latency, distinctly at the scale required by large AI projects.

Advanced adaptive routing, congestion control, and performance isolation technologies in Spectrum-X address these problems and deliver consistent, high-throughput data transfer with Colossus at a 95% efficiency rate. This is especially important as AI develops supercomputer equipped with Nvidia models, which continue to grow bigger, more compute-hungry, and demanding in terms of speed and low latency.

According to xAI, this approach allowed Colossus to become operational in record time, with full-scale training operations commencing a mere 19 days after the hardware installation. Normally, this is a supercomputer that takes months in the making, but xAI managed it in a record 122-day showcase of power by the Spectrum-X platform.

Future Plans and Growing AI Power

“Colossus is the most powerful training system in the world. Nice work by the xAI team, NVIDIA, and our partners,” Elon Musk, who heads xAI, celebrated on X.

Already with its current capability, Colossus stands among the ranks of the most powerful AI supercomputers in the world. By adding an additional 100,000 GPUs—a goal for its next generation—the system will become unparalleled, as competitors Microsoft and Oracle are in a race to create similar advanced supercomputers with next-generation Blackwell GPUs from Nvidia’s Spectrum-X Ethernet platform.

As demand for AI computing catapults, Colossus exemplifies how advanced networking solutions can improve the engineering scale, speed, and reliability needs for AI training.

It also epitomizes another fundamental milestone for AI infrastructure, where high-performance networking becomes just as crucial as the GPUs driving AI models and Spectrum-X.


Inside Telecom provides you with an extensive list of content covering all aspects of the tech industry. Keep an eye on our Tech sections to stay informed and up-to-date with our daily articles.