Synthetic data has been used throughout the AI industry — from synthetic line drawings for early vision systems, through synthetic video feeds for the first self-driving neural network ALVINN in the 1980s, to the pioneering works of Harvard statistics professor Donald B. Rubin in the early 1990s about the statistical properties of synthetic datasets.
According to Gartner, in the coming years, the data used to create AI models will be primarily synthetic and “generated by rules, statistical models, simulation, and other techniques” and will usurp the use of real data obtained from direct measurements. In other words, companies that do not embrace and integrate emerging synthetic data technologies will be left behind.
Synthetic data is emerging to be an essential element in building accurate and capable AI models, as it provides developers with vast amounts of perfectly labelled data on-demand.
As the world begins to understand synthetic data and its applications, how are companies viewing this emerging technology? Do they understand its potential? What barriers do early adopters see on the horizon?
A report by Synthesis AI found that synthetic data adoption is increasing, but its usage and understanding of the technology vary across the board — 87 per cent of organisations use techniques to enhance their data, including image augmentation, bootstrapping and generative models, just 43 per cent use it whenever possible, and 44 per cent have only just started using the technology.
Respondents knowledgeable of state-of-the-art synthetic data technologies expressed confidence in the technology’s ability to address critical issues utilising “real-world” data. Reducing the knowledge gap in the enterprise will lead to a better understanding of synthetic data benefits.
Despite recognising the importance of data enhancement, only half (51 per cent) of the respondents aligned with the explicit technical definition of state-of-the-art synthetic data approaches indicating a critical knowledge gap.
Barriers to Overcome in Further Adoption
While most respondents appear to understand synthetic data, the survey found that may not be true with their colleagues. Prominent barriers to entry when using synthetic data include a lack of organisational knowledge and slow buy-in from colleagues. Buy-in from colleagues and decision-makers will be critical for synthetic data to be accepted.
- 67 per cent agree that their organisation lacks the knowledge and understanding of implementing synthetic data.
- 67 per cent agree that users in their industry will not accept synthetic data until they see the benefits for themselves.
A Bright Future for Synthetic Data
More than half (59 per cent) of decision-makers believe that their industry will utilise synthetic data either independently or in combination with ‘real-world’ data within the next five years. This suggests that many organisations are only just starting to experiment with it. Synthetic data will be critical to the future of many industries and organisations and will lead to widespread change, especially among those who use vision data — 89 per cent agree that synthetic data is a new and innovative technology that will transform their industry.
Those who don’t employ synthetic data are at risk of falling behind the curve. Nearly nine in ten (89 per cent) of those who use vision data believe organisations that fail to adopt synthetic data in training their internal systems will lag behind.
The growth potential is evident, and those who work with vision data are best placed to take advantage. Among those working with vision data that don’t use or have only started using synthetic data, only three in 10 (30 per cent) respondents cite a lack of tools to create and manage synthetic data as a barrier to broader utilisation.
Synthetic data is just beginning its cycle of adoption and value to the enterprise. Many industries and companies are only just beginning to experiment with the technology. Still, synthetic data shows promise to cut down on the cost, improve access, and reduce the time it takes to build AI models in traditional ways.
That doesn’t mean there aren’t barriers to broader adoption. A key to further implementation is educating colleagues throughout the entire organisation, not just the C-suite, as there is confusion and a lack of understanding among many groups.
Organisations already using vision data are positioned to lead this charge, as they understand the value of vision data and how it can benefit their industry. According to Synthetic Data for Deep Learning, new research is starting to provide proof points around the utility of synthetic data across use-cases, including robotics, autonomous vehicles, smart homes, consumer products, manufacturing, logistics, healthcare, and more.
When further education and adoption are achieved, the benefits of implementing synthetic data technology are abundant. Synthetic data can help improve access to high-quality data. Synthetic data can also enable more capable models through new and more accurate data labels.
Also Read: Companies To Watch
Reduce bias in AI models
Synthetic data can also help reduce bias in AI models. Bias is often a result of unbalanced training data that does not properly represent the real-world data distribution. For instance, it is vital to have a dataset that covers all gender and identities in face recognition. By supplementing training data with synthetic data, data distributions will better reflect key demographics resulting in more balanced and fair AI systems. By lowering the barrier and cost, difficult-to-obtain datasets become more available, opening doors for enterprises of all sizes to build state-of-the-art models.
Millions of dollars and months of work could be saved, paving the way to create more models in a fraction of the time with fewer resources. Most technology industry leaders agree that synthetic data will be an essential enabling technology and key to staying ahead.