Oracle’s 65,000+ GPU Supercluster Now Generally Available

Oracle's 65,000+ GPU Supercluster Now Generally Available

According to Oracle, each Compute instance within the Supercluster has 76% more high-bandwidth memory and 40% more memory bandwidth than the H100 instance.

Oracle Cloud Infrastructure (OCI) Supercluster with Nvidia H200 GPUs is now generally available.

The Supercluster can scale up to 65,536 Nvidia H200 GPUs, and offers up to 260 exaflops of peak FP8 performance. Oracle claims it is the largest AI supercomputer in the cloud.

According to Oracle, each Compute instance within the Supercluster has 76% more high-bandwidth memory and 40% more memory bandwidth than the H100 instance thus improving its LLM inference performance by up to 1.9 times.

Supercluster has a custom-designed cluster network using RDMA over Converged Ethernet Version 2 (RoCE v2) on top of Nvidia ConnectX-7 network interface cards (NICs) which can handle up to 400 Gbps GPU to GPU interconnects.

It also features an upgraded 200 Gbps front-end network to move large data sets between storage and GPUs more efficiently.

The instances are Bare metal and each features eight Nvidia H200s with 141GB HBM3e memory, and two 56-core Intel Sapphire Rapids 8480+ CPUs.

Pricing remains $10 per GPU per hour, the same as with H100 instances. The H100 Supercluster can scale to 16,384 GPUs.

In September 2024, Oracle revealed that it would build a supercluster with up to 131,072 of the upcoming Nvidia Blackwell GPUs, set to launch during the first half of 2025.