Meta Introduces A New Text-to-Speech Model

Meta has designed a cutting-edge AI model Voicebox that can generate speech from text.

In the blog post, the company said that Voicebox produces text and graphics in a number of different styles and can either start from scratch or alter samples that have already been given to it. Voicebox creates high-quality audio samples rather than an image or a piece of text.

In addition to noise reduction, content editing, style conversion, and different sample production, the model can produce voice synthesis in six languages, including English, French, German, Spanish, Polish, and Portuguese. Meta revealed that Voicebox used a fresh method for learning just from raw audio and a corresponding transcription.

Unlike autoregressive models for audio generation, Voicebox can modify any part of a given sample, not just the end of an audio clip. Furthermore, the tech company added that Voicebox is trained to anticipate a speech segment when given the transcript of the section and the speech that comes before it.

This versatility enables Voicebox to perform well across a variety of tasks, including — in-context text-to-speech synthesis, cross-lingual style transfer, speech denoising and editing, and diverse speech sampling.

As Meta wrote in their announcement, Voicebox represents an important step forward in generative AI research. “We look forward to continuing our exploration in the audio domain and seeing how other researchers build on our work”, the team wrote in the post.

Read the full statement here. Also don't forget to join our 80 Level Talent platform and our Telegram channel, follow us on Instagram and Twitter, where we share breakdowns, the latest news, awesome artworks, and more.

Join discussion

Comments 0

    You might also like

    We need your consent

    We use cookies on this website to make your browsing experience better. By using the site you agree to our use of cookies.Learn more