OpenAI’s Voice Cloning AI: Revolutionizing Text-to-Voice Generation in Just 15 Seconds

OpenAI is now offering limited access to a text-to-voice generation platform called Voice Engine. This platform can create a synthetic voice based on a short 15-second clip of someone’s voice. The AI-generated voice can then read out text prompts in the same language as the original speaker or in various other languages.

According to OpenAI’s blog post, companies like Age of Learning, HeyGen, Dimagi, Livox, and Lifespan already have access to this technology. Age of Learning has been using the AI to generate pre-scripted voice-over content and personalized responses for students written by GPT-4.

OpenAI began developing Voice Engine in late 2022 and has already powered preset voices for text-to-speech APIs and ChatGPT’s Read Aloud feature. The model was trained on a mix of licensed and publicly available data, and it will only be available to about 10 developers.

AI text-to-audio generation is an evolving field, with companies like Podcastle and ElevenLabs providing similar technologies. However, ethical concerns around AI voice technology continue to be addressed, with OpenAI implementing usage policies to prevent misuse of Voice Generation.

OpenAI suggests various measures to limit risks associated with these tools, including phasing out voice-based authentication for bank accounts and educating the public on AI deepfakes. They have also added watermarking to audio clips to trace their origin and actively monitor their usage.

