Meta training next-gen AI model on industry's largest GPU cluster

Thu Oct 31 12:10:39 IST 2024

Meta training next-gen AI model on industry's largest GPU cluster
31 Oct 2024

Meta CEO Mark Zuckerberg has revealed that the company's upcoming Llama 4 AI model is being trained on an unprecedentedly large cluster of GPUs.

Speaking during an earnings call, Zuckerberg said that Llama 4's development is well underway and its initial launch is expected early next year.

"We're training the Llama 4 models on a cluster that is bigger than 100,000 H100s, or bigger than anything that I've seen reported for what others are doing," he said.

Leader in AI training scale
AI advancements

The scale of AI training is deemed critical for developing more sophisticated AI models.

As of now, Meta appears to be ahead in this department, with many giants probably looking to employ compute clusters of over 100,000 high-end chips.

Earlier this year, Meta and NVIDIA revealed details of clusters of some 25,000 H100s employed for Llama 3 development.

Meanwhile, Elon Musk's xAI venture partnered with X and NVIDIA to establish a cluster of 100,000 H100s.

Meta's unique approach to AI development
Open-source strategy

Meta's approach to AI development is quite different, as it allows Llama models to be fully downloaded for free.

This is unlike models from OpenAI, Google, and most other big players that only provide access through an API.

Although Meta calls it "open source," the Llama license does place some restrictions on commercial use and doesn't reveal details of the models' training.

Llama 4 development poses engineering challenges
Resource demands

The development of Llama 4 will likely come with unique engineering challenges and require a lot of energy.

According to estimates by SemiAnalysis, a cluster of 100,000 H100 chips requires around 150 megawatts of power, five times the power consumed by El Capitan, the largest national lab supercomputer in the US.

Despite these challenges, Meta plans to invest up to $40 billion in capital this year on data centers and other infrastructure.

Meta's open-source AI strategy stirs debate

AI concerns

Meta's open-source approach to AI has sparked debate among experts.

Some are concerned that freely available, powerful AI models could be misused for cyberattacks or to automate the design of chemical or biological weapons.

Despite these concerns, Zuckerberg remains confident about the open source strategy.

He believes that "open source will be the most cost effective, customizable, trustworthy, performant, and easiest to use option that is available to developers."