With Sora, the studio plans to teach AI "to understand and simulate the physical world".
OpenAI, the development company behind ChatGPT and DALL-E, has officially presented Sora, a new text-to-video diffusion model created as part of the team's attempts to teach artificial intelligence to understand and simulate the physical world in motion.
Powered by a transformer architecture, similar to GPT models, Sora can generate videos lasting up to a minute while maintaining high visual quality and fidelity to user prompts. In addition, the model is able to generate videos out of existing still images, animating the image's contents with accuracy and capturing even the smallest details.
"Sora is able to generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background," commented the team. "The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world."
"The model has a deep understanding of language, enabling it to accurately interpret prompts and generate compelling characters that express vibrant emotions. Sora can also create multiple shots within a single generated video that accurately persist characters and visual style."
Starting today, a selected group of individuals, including visual artists, designers, filmmakers, and red teamers, will have access to Sora to assess potential risks or harms. According to OpenAI, its decision to share its research progress early reflects its commitment to getting feedback from people outside of the company and providing insight into the future capabilities of AI.
Learn more about Sora here. Also, don't forget to join our 80 Level Talent platform and our Telegram channel, follow us on Instagram, Twitter, and LinkedIn, where we share breakdowns, the latest news, awesome artworks, and more.