AWS Announces Synthetic Data Generation For SageMaker Ground Truth


SageMaker Ground Truth can produce millions of automatically labelled synthetic images

AWS announced that users can create labelled synthetic data with Amazon SageMaker Ground Truth. SageMaker Ground Truth is a data labelling service that makes it simple to label data and allows you the choice to use human annotators through third-party suppliers, Amazon Mechanical Turk, or your private workforce. One can alternatively produce tagged synthetic data without actively gathering or labelling real-world data. On your behalf, SageMaker Ground Truth can produce millions of automatically labelled synthetic images.

The process of creating machine-learning models is iterative and begins with data preparation and gathering, then moves on to model training and model deployment. Collecting extensive, varied, and precisely labelled datasets for your model training is frequently tricky and time-consuming, especially in the initial stage.

Combining your real-world data with synthetic data helps build more comprehensive training datasets for your machine-learning models. Synthetic data itself is created by simple rules, statistical models, computer simulations, or other techniques. This makes it possible to generate vast amounts of synthetic data with exact labels for annotations over tens of thousands of images. A minimal granularity, such as a pixel or sub-object level, and across modalities can be used to determine the label accuracy. Bounding box, polygon, depth, and segment modalities are some examples.

Synthetic data is a powerful solution to two different problems: data limitations and privacy risks. Synthetic data can supplement training data to reduce overfitting when there is a lack of labelled data. For privacy protection, data curators can provide made-up data rather than actual data in a way that simultaneously safeguards users’ privacy and keeps the original data’s usefulness.

By adding data diversity that real-world data may lack, you can produce more full and balanced data sets by combining your real-world data with synthetic data.

With SageMaker Ground Truth, you are free to design any imaging scenario with synthetic data, including edge cases that could be challenging to identify and replicate in real-world data. Variations can be added to objects and surroundings to reflect changing lighting, colours, textures, poses, or backgrounds.

In other words, you may order the precise use case for which your machine-learning model is being trained. Amazon SageMaker Ground Truth synthetic data is available in US East (N. Virginia). Synthetic data is priced on a per-label basis.