Voice Engine: OpenAI's Voice Synthesis Model

OpenAI has unveiled Voice Engine, a model capable of voice cloning from a 15-second audio recording. Among the users of the model, the company mentions podcasters, announcers, audiobook authors, advertisers, streamers, and other professions.

At present, the technology is only available to a small group of the company’s partners. For instance, educational startup Age of Learning utilizes Voice Engine and GPT-4 to create personalized voice content based on a pre-written script in real-time, expanding reading and interactivity options for diverse student audiences.

OpenAI also emphasizes the voice engine’s potential to provide support to non-speaking individuals by offering them unique, non-robotic voices, as well as assistance in therapeutic and educational programs for people with speech impairments or in need of learning. At the Norman Prince Neurology Institute, the model was used to restore the speech of a patient with a brain tumor based on a video recording from one of her school projects.

To guard against fraud, OpenAI has implemented security measures, including watermarking the audio track. The model has been in development since 2022 and currently supports the OpenAI Text-to-Speech API and new voice control features for ChatGPT and text-to-speech reading introduced in 2024.

Listen to synthesized voice examples here.

More from Neurohive