The field of AI is witnessing a wave – the transition from task-specific to generalist models.
DALL-E 2, BERT and GPT-3 – these AI models form the talk of the AI community. It’s all – Who is bringing the next model? How many parameters is it fed on? How much bias does it have/avoid?
This wide acceptance and popularity come from the fact that these models generate a whole essay or a visual art simply being provided with a short prompt as input, even though they weren’t explicitly trained to do so.
The Stanford Institute for Human-Centered Artificial Intelligence first popularised the term “Foundational Models ”when it published a paper defining how they have caused a “paradigm shift” in AI. By definition, foundational models are models trained on a wide range of unlabeled data that can be used for numerous applications with minimal fine-tuning.
The need now
The term is fresh but not the idea behind it. To be specific, foundational models are built on deep neural networks (inspired by how the human brain works). The self-supervised learning approach allows a model to make connections from heaps of unstructured data. As a result, the model can transmit knowledge about one circumstance to another and transfer learning. The new foundational models can help solve many existing challenges.
- First, while many new AI systems assist in resolving many real-world issues, developing and deploying each new system frequently takes a considerable investment of time and money, hence making for a costly affair.
- Second, the AI models today are built to provide customised solutions. For every specific solution, there is a need for a sizable and accurately labelled dataset for the particular task one intends to use.
- Third, without a dataset, it would take hundreds or thousands of hours for individuals to label and find appropriate texts, graphs, or images for the dataset. After the AI model has learned to recognise everything in the dataset, one may use it to solve their particular use case, ranging from facial recognition to creating a novel combination of atoms for drug discovery.
Foundational models augment deep learning models that further accelerate the AI research domain.
The models exhibit novel capabilities; they can change other industries and raise living standards. The sectors that can largely benefit include:
Healthcare: Healthcare and biomedical research, such as precise treatment and scientific discovery of new therapies, requires expert knowledge, which is limited and expensive. Due to the abundance of data across numerous modalities (for example, images, text, and molecules) to train foundation models and the value of improved sample efficiency in adaptation, foundation models present clear opportunities in these domains. Furthermore, foundation models’ generative capabilities point to promise for open-ended research challenges like drug development.
Law: Attorneys and associates frequently consult a vast collection of information and expertise while discussing law and legality. Additionally, they need to create extensive, methodical sections with various texts and decipher complex legal concepts. Foundation models aid greatly when assessing the data from numerous legal cases, papers, and transformations.
Education: Education is a complex and nuanced field; good teaching entails thinking about students’ cognitive processes and considering their learning objectives. As data in education is limited, using data from textbooks, diagrams, mathematical formulas, and video-based tutorials offers hope for the broad application of foundation models in educational tasks.
Environment: Training a large NLP model produces about as much carbon dioxide as five cars do over the course of their lifetimes. Going ahead with the foundational models will reduce the dependency on task-specific models, increasing the AI community’s contribution toward sustainability.
However, certain aspects must be considered before wide acceptance and usage of foundational models come into play.
Tread with care
The enormous potential and creativity exhibited by the new foundational models might be an issue when it comes to patent law in the future. It would be quite difficult to differentiate if the creative output is the result of a human or a machine.
Moreover, this will further need to deal with the inaccurate and deceptive information generated by these systems. Disinformation is already a considerable concern, as evidenced by the ongoing Russian assault on Ukraine and the emerging issue of deep fakes, but foundation models are poised to make matters worse.
Due to the massive size of the models, extensive computational and other resources are needed to train them. According to one estimate, it would cost about $5 million to train the GPT-3 language model for OpenAI. Naturally, only major tech firms like OpenAI, Google, and Baidu now have the resources to develop foundation models, resulting in restricted access to the systems.
One side of the restriction in usage implies more unethical ways of access, ending up creating fake content, which at times is defamatory. The other means that independent researchers cannot examine these systems and share the findings honestly and transparently, thus leaving us unaware of the models’ effects.