OpenSynthetics To Focus On Synthetic Data For AI Development

OpenSynthetics-To-Focus-On-Synthetic-Data-For-AI-Development

OpenSynthetics, an open community for creating and using synthetic data in AI/ML and computer vision, was launched to practitioners, researchers, academics, and the wider industry.

OpenSynthetics is the first dedicated community focused on advancing synthetic data technology with centralised access to synthetic datasets, research, papers, and code. Synthetic data, or the use of computer-generated images and simulations used to train computer vision models, is an emerging technology that was recently noted as one of the top 10 breakthrough technologies of 2022 by MIT Technology Review. The first book on Synthetic Data for Deep Learning was also published last year and has seen widespread adoption.

Through OpenSynthetics, AI/ML practitioners, regardless of experience, can share tools and techniques for creating and using synthetic data to build more capable AI models. Whether an individual or organisation is beginning their synthetic data journey or fully utilising it in production systems, they will have access to content relevant to their needs and experience. Additionally, OpenSynthetics will serve as a community hub, bringing together academics, practitioners, and researchers to collectively advance the use of synthetic data.

“Bringing together new and experienced researchers to contribute and share knowledge is an important step and an incredible milestone for the synthetic data industry,” said Yashar Behzadi, CEO of Synthesis AI. “The launch of OpenSynthetics comes when synthetic data is at an inflection point and is being leveraged to build more capable and ethical AI models for autonomous vehicles, robotics, drones, the metaverse, and more. By creating a centralised hub of synthetic data resources, we hope to advance synthetic data’s role in powering the next generation of computer vision.”

Current computer vision models are powered by hand-labelled data, which is labour-intensive, costly, time-consuming, and prone to human error and bias. Additionally, the collection of images of people presents privacy concerns. Using synthetic data approaches, labels and data are available on-demand, allowing practitioners to experiment and reducing time spent collecting and annotating data. However, the democratisation of synthetic datasets, papers, and resources is needed to educate the industry on this technology and power further use cases.