Elon Musk’s xAI supercomputer in Memphis, Tennessee has gone live.
Dubbed the Memphis Supercluster by Musk in a post on X/Twitter, he said the system went live at ~4:20am* local time on July 22 and is using a single RDMA (remote direct memory access) fabric to connect up to 100,000 liquid-cooled Nvidia H100 GPUs.
“It’s the most powerful AI training cluster in the world!” Musk said in his post, having previously described the 150MW data centre as a ‘Gigafactory of Compute.’
In a follow-up post on X, Musk added: “This is a significant advantage in training the world’s most powerful AI by every metric by December this year.”
Despite Musk’s posts, it’s not clear how much of the cluster is online. SemiAnalysis estimates that the company has around 32,000 GPUs currently, with the rest online by Q4.
The company has only 8 MW of power available from the power grid, while agreements xAI has brokered with utility companies such as Tennessee Valley Authority won’t be signed until next month, but could add 50MW.
It’s also unlikely the cluster has started training already as the debugging and optimisation process could take some time.
xAI first announced plans for the data centre last month, with Dell Chairman and CEO Michael Dell and Musk both later confirming that Dell and Supermicro (SMC) would provide the servers. No information about how the GPU clusters will be divided between the two companies has been shared.
Speaking at the time of Musk’s initial announcement, Ted Townsend, President of the Greater Memphis Chamber, said it would be the “largest multi-billion dollar investment in the city of Memphis’ history.”
Musk is also planning to add another 300,000 GPU B200 cluster next summer, with the aim of having the entire project up and running by fall 2025. The machine will be used to power the next version of xAI’s Grok chatbot.
xAI is believed to currently rent around 16,000 Nvidia H100 GPUs from Oracle Cloud, while also using Amazon Web Services and spare capacity at X/Twitter data centres.