Stability AI Unveils Enhanced SVD 1.1 Model for Superior AI-Generated Video Consistency

Stability AI, the creators of the incredibly popular Image generation models Stable Diffusion, recently unveiled an update to its recently released image-to-video diffusion model known as Stable Video Diffusion (SVD).

The update SVD 1.1, this iteration is an advancement over its predecessor, SVD 1.0, is an improvement on it’s predecessor by providing better motion, consistency and an overall jump in quality

The model is now available on Huggingface, as well as on directly usable on their website via free and paid subscription tiers, with memberships starting at $20 monthly. Commercial users interested in utilizing SVD 1.1 will need to opt for a membership package.

The company assures that for research endeavors, access remains unrestricted and complimentary.

What’s New?

The original SVD and its enhanced counterpart, SVD-XT, released in November last year. Was designed to animate a still image into a four-second video, generating up to 14 frames, while SVD-XT was capable of producing videos with up to 25 frames. Both models did not accept a text input, instead using context from the image provided to bring it to life

Building upon the progress with SVD-XT, SVD 1.1. This latest model not only maintains the four-second video output with 25 frames but also ensures a higher resolution of 1024×576 for the input frame, making it suitable for social media.

This upgrade is anticipated to provide more stable and coherent video outputs compared to its predecessors. Instances where earlier models struggled with achieving photorealism or exhibited minimal motion are expected to be addressed with the introduction of SVD 1.1, promising a leap towards more dynamic and realistic outputs.

The refinement for SVD 1.1 focused on enhancing output consistency through fixed conditioning at 6FPS and employing motion bucket Id 127, as detailed on the Hugging Face platform. These improvements aim to negate the need for hyperparameter adjustments while maintaining flexibility for users.

Stability AI has also made the Stable Video Diffusion models accessible through an API on its developer platform, facilitating developers to integrate advanced video generation capabilities directly into their applications.

The API, enables the creation of four-second videos at 24fps in MP4 format, comprising 25 generated frames alongside interpolated frames for a complete visual experience. The API supports various video resolutions and layouts, empowering developers with versatile tools for content creation.

As Stability AI continues to innovate in the generative AI domain, with consistent model releases marking its journey since its inception in 2019, the landscape of AI-driven video generation is evolving. Despite facing competition from platforms like Runway and Pika, which have also made strides in video customization and enhancement, Stability AI distinguishes itself with its API offerings, enabling broader application development integration.

Innovations such as Runway’s Multi Motion Brush and Pika’s region-specific video modifications showcase the dynamic nature of AI video generation technology. However, the unique API access provided by Stability AI sets a benchmark for developer engagement in creating more immersive and interactive video content.