BigScience Releases 176B Parameter AI-Language Model “BLOOM”

BigScience-Releases-176B-Parameter-AI-Language-Model-BLOOM

BLOOM is the first AI language model with more than 100B parameters

BigScience research recently released BigScience Large Open-science Open-access Multilingual Language Model (BLOOM), an autoregressive language model based on the GPT-3 architecture. BLOOM is trained on data from 46 natural and 13 programming languages and is the largest publicly available, open-source, multilingual model.

The release was announced on the BigScience blog. The model was trained for nearly four months on a cluster of 416 A100 80GB GPUs. The training process was live-tweeted, with training logs publicly available for viewing via TensorBoard. The model was trained with a 1.6TB multilingual dataset containing 350B tokens; for almost all of the languages in the dataset, BLOOM is the first AI language model with more than 100B parameters. BigScience is still performing evaluation experiments on the model. Still, preliminary results show that BLOOM has zero-shot performance on a wide range of natural language processing (NLP) tasks comparable to similar models.

“This is only the beginning. BLOOM’s capabilities will improve as the workshop continues to experiment and tinker with the model. All of the experiments researchers and practitioners have always wanted to run, starting with the power of a 100+ billion parameter model, are now possible. BLOOM is the seed of a living family of models that we intend to grow, not just a one-and-done model, and we’re ready to support community efforts to expand it,” said the BigScience team.

Large language models (LLMs), especially auto-regressive decoder-only models such as GPT-3 and PaLM, have been shown to perform as well as the average human on many NLP benchmarks. Although some research organisations, such as EleutherAI, have made their trained model weights available, most commercial models are either completely inaccessible to the public or gated by an API. This lack of access makes it difficult for researchers to gain insight into the cause of known model performance problem areas, such as toxicity and bias.

The BigScience workshop began in May of 2021, with over 1,000 researchers collaborating to build a large, multilingual deep-learning model.

The collaboration included members of the Institute for Development and Resources in Intensive Scientific Computing (IDRIS) and Grand Equipement National De Calcul Intensif (GENCI). These provided the workshop with access to the Jean Zay 28 PFLOPS supercomputer. The team created a fork of the Megatron-DeepSpeed codebase to train the model, which used three dimensions of parallelism to achieve a training throughput of up to 150 TFLOPs. NVIDIA states this is “the highest throughput one can achieve with A100 80GB GPUs.” Training the final BLOOM model took 117 days.

Although BLOOM is currently the largest open multilingual model, other research groups have released similar LLMs. Earlier this year, EleutherAI open-sourced their 20B parameter model GPT-NeoX-20B. InfoQ also reported last year on BigScience’s 11B parameter T0 model. The BLOOM model files and an online inference API are available on the HuggingFace site. BigScience also released its training code on GitHub.