Stable Diffusion Create Streamline Workflow For Animation With IPAdapter & AnimateDiff

In today’s tutorial, we’re venturing into the exciting world of Comfy UI to unveil a seamless animation workflow that combines Stable Diffusion IPAdapter, Roop Face Swap, and AnimatedDiff. This ingenious workflow simplifies the process of creating captivating video animation scenes, making it a breeze, especially when paired with ChatGPT. I’ll guide you through this speedy method, eliminating the need for extensive text prompts.

A Duo of Dynamic Workflows

Our journey begins with two critical workflows within Comfy UI. The first is the basic text-to-image workflow, a default offering in Comfy UI. In our demonstration, we’ll harness the power of the SDXL model, more specifically, the Dream Shaper XL. Why SDXL, you ask? Well, it allows us to generate images based on sentences, a significant departure from Stable Diffusion 1.5, which relies on specific technical prompts. I’ll maintain the same settings for lighting, image quality, and high-resolution details throughout, effectively treating the negative prompt as a template for each scene. This means we only need to modify the first two lines of the text prompt, allowing us to copy and paste content from ChatGPT and create a streamlined workflow for building animation story scenes.

The Second Workflow – A Designer’s Dream

The second workflow is a creation of my own, thoughtfully incorporating IPAdapter, Roop Face Swap, and AnimatedDiff. You can find setup instructions for these Comfy UI custom nodes in the video description. The beauty of this workflow lies in its synergy with the images generated in the first workflow. These images are seamlessly loaded into IPAdapter, sparing us the time and effort required to craft detailed text prompts for each animation scene.

Check out our video tutorial how to use this:

Harnessing ChatGPT’s Creative Force

Now, let’s shift our focus to ChatGPT. We’ll rely on ChatGPT to craft a short story, complete with scenes and narrator speech. Our journey begins with a request for an outline for our story. We’ll then dive into each act, with ChatGPT helping us create scenes and voice-over scripts. As ChatGPT isn’t particularly effective at handling Stable Diffusion 1.5-style text prompts, we’ll use it primarily to generate scene descriptions. We’ll reserve SDXL to bring our scenes to life, as it effortlessly understands text prompts in sentence form. Once all chapters are generated, we’ll copy and paste them into a Word file, ready to be transformed into animations within Stable Diffusion Comfy UI.

Navigating the Dual Workflow

As showcased at the start of this video, we’re working with two workflows in Comfy UI. We’ll switch between these workflows to process each animation scene. During this process, a significant portion of your time will be devoted to copying and pasting text from the Word file into workflow 1, which employs the SDXL model for text-to-image conversion. Here, you have the flexibility to adjust settings such as width, height, and other sampler parameters.

The true magic, however, unfolds in workflow 2. Here, we load the generated images from workflow 1, ushering them into the AnimatedDiff nodes. By initiating the animation settings with an IPAdapter image prompt, the process proceeds to the sampler node, where image processing begins.

You’ll observe that we keep the prompt and negative prompt in the sampler devoid of descriptive language. Instead, we copy and paste image quality setting keywords from the text-to-image workflow, as displayed here. We place our trust in IPAdapter to bring our animations to life based on the loaded image.

The final step takes us to the Roop Face Swap custom node for each image frame. Our workflow culminates as the Video Combine node skillfully combines all image frames into a seamless animation.

Ready to Dive In

To ease your journey, we’re sharing the workflow JSON files in the video description, and they’re also available on my website for experimentation.

Now, it’s time to embark on creating animations. We’ve already generated some scenes using text-to-image, saving them in a folder. You can readily drag and drop an image into the IPAdapter’s load image node, like this. Here, you have the choice of selecting IPAdapter models, designed specifically for faces or general IPAdapter for images.

Within the AnimateDiff nodes group, you have the opportunity to fine-tune your motion. You can choose from various motion Lora models. The green box awaits your text prompt, while the red box is dedicated to the negative prompt. We continue the efficiency by copying and pasting the same image quality setting keywords from the text-to-image workflow.

With a click of the ‘generate’ button, the animation springs to life. This workflow may take some time for each animation, considering the amalgamation of multiple features. The generation time hinges on your hardware configuration. Typically, it takes a few minutes for me for each scene. During this interval, you can conveniently switch back to workflow 1 to prepare for another story scene. It’s worth noting that when generating text-to-image, you may still need to queue up in line.

Exploring Diverse Scenarios

For example, in the case of my mermaid short story, I scrutinize the ChatGPT script to ensure it’s compatible with SDXL for image generation. If the sentence proves too complex for Stable Diffusion, slight modifications may be necessary for SDXL to comprehend.

After generating a selection of images, it’s time to explore more IPAdapter-to-Animation workflows. I’ve created a mermaid image just now, and I’m ready to drag and drop it into the load image node. As we delve into the AnimatedDiff nodes, you’ll appreciate the versatility, with the ability to use multiple motions. In this instance, I’m applying Tit Up and then Zoom Out. It’s a notable advantage, as some recent updates restrict the use of only one motion at a time. Not so with Stable Diffusion, which excels in this area. Here, you can also adjust animation width, height settings, and the batch size to determine the number of frames. A higher number means a longer time for generation.

Conclusion

So, there you have it, a powerful workflow that unlocks the potential of Comfy UI, simplifying the creation of captivating animations. You’re now armed with the tools and knowledge to venture into this realm of animation with confidence. The workflow JSON files are readily available for your experimentation. As you embark on your creative journey, I’m excited to see the masterpieces you’ll bring to life.

Until next time, happy animating!

Resources:

Workflows Json Files : https://drive.google.com/file/d/1QquJNpk72BnnR2Wf-tDWFuA7zkl5dooo/view?usp=sharing

Custom Nodes For This Workflow:

Roop : https://github.com/ssitu/ComfyUI_roop

IPAdapter-ComfyUI : https://github.com/laksjdjf/IPAdapter-ComfyUI

Animatediff : https://github.com/ArtVentureX/comfyui-animatediff