Skip to content
Home » A Guide to Pyramid Flow on ComfyUI – Local Install For Txt2Vid

A Guide to Pyramid Flow on ComfyUI – Local Install For Txt2Vid

ComfyUI has recently added support for Pyramid Flow, an open-source AI video model developed by Kuaishou. This model allows users to generate videos from text prompts or initial image frames, enabling a new level of creativity and experimentation with AI-generated content. In this comprehensive guide, we’ll dive into the process of setting up and using Pyramid Flow within the ComfyUI environment, exploring its capabilities and limitations.

Getting Started with Pyramid Flow on ComfyUI

The first step in using Pyramid Flow on ComfyUI is to install the necessary custom nodes. ComfyUI Manager provides an easy way to install these nodes with just a few clicks. Once installed, you’ll need to download the Pyramid Flow model files from Hugging Face and store them in the appropriate folder within ComfyUI.

ComfyUI offers two custom node wrappers for Pyramid Flow, allowing you to choose the one that best suits your preferences. These wrappers simplify the process of generating videos by providing a user-friendly interface and preconfigured settings.

Text-to-Video Generation

One of the most exciting features of Pyramid Flow is its ability to generate videos from text prompts. The process involves encoding the text prompt, passing it to the sampler along with the model data, and then decoding the sampled latent data to produce the final video output.

ComfyUI provides example workflows for text-to-video generation, making it easy to get started. However, it’s important to note that the quality of the generated videos can vary significantly depending on the prompt and the settings used.

Image-to-Video Generation

In addition to text-to-video generation, Pyramid Flow also supports image-to-video generation. This feature allows you to use an initial image frame as a starting point and generate a video based on that image and a text prompt.

The image-to-video workflow in ComfyUI follows a similar process to the text-to-video workflow, with the addition of an image encoding step. The initial image is encoded, and the resulting latent data is used as the starting point for the sampler.

Fine-tuning and Customization

While the out-of-the-box performance of Pyramid Flow can produce impressive results, there are various settings and parameters that can be tweaked to improve the quality or achieve specific visual effects. ComfyUI provides access to several key parameters, such as temperature, guidance scale, and video guidance scale.

Adjusting these parameters can significantly impact the generated videos, allowing you to strike a balance between stability and motion, or to introduce more creative and dynamic effects.

Limitations and Challenges

Despite its impressive capabilities, Pyramid Flow is still a relatively new and evolving technology, with certain limitations and challenges. One notable limitation is its performance with human characters and faces, which often exhibit morphing or deformation issues, especially during fast motions.

Additionally, Pyramid Flow requires a significant amount of VRAM (at least 12GB, but preferably 20GB or more) to run smoothly, making it challenging to use on consumer-grade hardware.

 

Showcases Of Generated Result

As mentioned in the video I share some generated result base on my testing

 

 

Resources And Reference

Custom Node: https://github.com/kijai/ComfyUI-PyramidFlowWrapper

 Model : https://huggingface.co/rain1011/pyramid-flow-sd3/tree/main download to: ComfyUI/models/pyramidflow/pyramid-flow-sd3

Basic txt2vid and img2vid workflows here : https://www.patreon.com/posts/113979132?utm_medium=clipboard_copy&utm_source=copyLink&utm_campaign=postshare_creator&utm_content=join_link

Additional Info For Patreon: https://www.patreon.com/posts/pyramid-flow-in-113979132?utm_medium=clipboard_copy&utm_source=copyLink&utm_campaign=postshare_creator&utm_content=join_link