Sora: OpenAI's Groundbreaking Text-to-Image Diffusion Model

OpenAI has unveiled Sora, a diffusion-based text-to-image model capable of generating 60-second videos. Compared to competitors like Runway, Pika, Stability AI, and Google, OpenAI’s model boasts high-resolution (Full HD) output, smooth camera and object motions, and remarkable anatomical accuracy in human depictions.

Unlike Runway and Pika models, which are limited to generating only 4 seconds of video at a time, Sora crafts 60-second videos with the ability to seamlessly extend existing footage. By simultaneously generating all frames of the video, Sora eliminates the issue faced by other models where objects may momentarily disappear from the camera’s view. Additionally, the model not only considers the content of the request but also how objects typically appear within the requested scene.

OpenAI acknowledges that the current version of the model exhibits inaccuracies in video generation. For instance, if a person takes a bite of a cookie, the cookie remains intact afterward.

Currently, the model is undergoing testing to address shortcomings, biases, and potential misuse. It is accessible to a select group of content creators participating in the testing phase, providing feedback to OpenAI.

For more examples of videos generated by Sora and OpenAI’s technical report, visit here.

More from Neurohive