Resemble AI announced the launch of its speech-to-speech feature to capture the unique style of human voices at scale and bring natural-sounding AI voices to millions of developers and creators across gaming, entertainment, e-learning, and more.
Resemble AI’s generative audio continues expanding creative possibilities for human voices and beloved characters.
Now AI voices can perform a wide range of emotions, speaking styles or even singing using non-speech vocalisations. If the input audio is spoken in a different language, the resulting target voice will be able to speak in that language. To see speech-to-speech in action, watch this video.
“We are happy with the quality of voices we are able to develop for ‘Animals Anonymous’ using Resemble AI,” says Fika Agency co-founder Adam Altman. “Now our entire team can record new episodes that feel consistent with the same voices our listeners are used to hearing.”
The team at Resemble AI refined speech-to-speech when it used 3 minutes and 12 seconds of Andy Warhol’s original voice recordings from the 1970s and 80s to produce synthetic voice narration for the Emmy-nominated and Dorian Award-winning Netflix docu-series, The Andy Warhol Diaries. The team made adjustments for emotion and pitch to the AI output of Andy Warhol’s voice and added human-like imperfections using audio reference clips of another speaker, as seen in this video about how Resemble AI’s Style Transfer works.
“Resemble AIʼs mission is to make interactions with digital products as human and natural as possible,” says Resemble AI founder and CEO Zohaib Ahmed. “We have dramatically accelerated and simplified the process of creating human-like AI voices by building hyper-realistic synthetic voices that can match or expand the reach of voice actors. This means that developers, creators and storytellers can create content in any language, and with any voice, without the need for expense or travel associated with recording studios.”
Resemble AI customers on the Basic, Pro (new) and Enterprise plans will see a new option for speech-to-speech when creating a sentence in a clip within a project. Instead of using text as input, they can provide a spoken sentence by uploading a pre-recorded audio file or recording directly through the interface. This enables high-quality AI voices and allows the target voice to speak in a different language than the original voice–which must be the same voice that records a consent line at the start of any project.