IPadapter & Controlnet: How to change clothes & Pose with AI

Posted by:

|

On:

|

Introduction

IPAdapter and ControlNet are powerful tools for digital artists, fashion designers, and AI enthusiasts, enabling seamless clothing application and pose manipulation for digital characters. This combination offers a streamlined method to apply clothing naturally in any pose, a process that traditionally required tedious manual adjustments.

What are IPAdapter and ControlNet?

IPAdapter: Fits clothing to specific poses or body shapes by analyzing reference images and applying attention masks to handle fabric flow and body contours.

ControlNet: Captures and manipulates poses using deep learning algorithms, accurately interpreting poses from images or videos.

Together, they allow for realistic character creation, whether for virtual avatars, fashion showcases, or eCommerce.

Setting Up the Workflow

Here’s a quick guide to set up IPAdapter and ControlNet:

1. Load Base Nodes: Start with the checkpoint loader, prompt nodes, and a seed management node set to randomize for varied generations.

2. ControlNet Setup: Load a reference image to define the pose. Use ControlNet with the OpenPose XL2 model and apply advanced ControlNet adjustments to refine the pose integration.

3. Sanity Check: A sanity check group ensures that the ControlNet Pose is functioning correctly by generating a preliminary image.

The Controlnet Setup can be used for two different types of situations

For a single article of clothing, we use the following process:

  1. Load Image: Feed the image into IP Adapter using a standard approach.
  2. Apply ControlNet Pose: Ensure the model follows the pose from the reference image.
  3. Result: Generate the final image, adjusting as needed for improved fidelity.

Multiple Articles of Clothing

For two articles of clothing, we use an advanced approach:

  1. Unified Loader: Load both the shirt and pencil skirt images into IP Adapter.
  2. Attention Masks: Help IP Adapter understand where each clothing item goes by creating attention masks.

The “Fast Groups Mutter by RG3” node is a powerful feature that allows you to selectively enable or disable different groups within your workflow. This can be incredibly useful for exploring multiple creative directions or optimizing performance.

For example, if you’re working on an outfit design with a one-piece and a two-piece version, you can easily toggle between these options by turning the corresponding group on or off. This gives you the flexibility to quickly compare the designs and experiment with different looks.

Beyond managing clothing, the Fast Groups Mutter node can also help optimize your workflow. If you’re working on a complex 3D scene and your GPU is struggling with memory demands, you can turn off the ‘sanity check’ group to potentially free up resources and improve performance, especially on less powerful hardware.

The ability to selectively enable and disable groups is a game-changer. It empowers you to streamline your creative process, explore multiple ideas in parallel, and optimize your workflow to suit your specific needs and hardware constraints.

Understanding Attention Masks

This group shows  a technique for efficiently using multiple clothing articles with a character. It utilizes a unified loader that feeds into an IP adapter advance node.

The process is as follows:

1. A shirt and a pencil skirt are fed into the unified loader.

2. The unified loader’s output goes into the IPAdapter advance node.

3. For the second image, the IPAdapter output from the previous step is fed directly into the unified loader.

This approach helps save resources by avoiding the need to reload all the models. The key benefit is the ability to easily combine and manage multiple clothing items for a character.

Adjusting IP Adapter Weights

In the IPAdapter Advanced node, you’ll notice that we have the option to adjust the weights. By default, these weights are set to a certain value, but adjusting these can significantly impact the final result. 

For instance, if we increase the weight for the shirt reference image, we are telling the IPAdapter to pay more attention to the shirt. Conversely, if we decrease the weight for the skirt, we are telling the IPAdapter to pay less attention to the skirt. 

Note: I cover this in my IPAdapter video, which includes a phenomenal toolkit to test image outputs for various weight types. I highly recommend checking it out. Ultimately, it’s about experimenting to achieve the desired results.

Automating Mask Generation with “Segment Anything” Nodes

Let’s show off the automated system that this series of nodes called “Segment Anything.” It’s pretty simple—it’s a collection of three nodes. You grab the ‘GroundingDinoSAMSegment (segment anything),’ the ‘GroundingDinoModelLoader (segment anything)’, and the ‘SAMModelLoader (segment anything)’, and you just wire them up.

  1. Connecting the Nodes:
    1. Connect the SAMModelLoader into the SAM_MODEL input.
    2. Connect the GroundingDinoModelLoader into the GroundingDINO input.
  2. Using the Nodes: Plug in the reference image and specify what you’re looking for. Here, I’ve typed in “shirt,” and it will actually generate both the shirt cutout and a mask of the shirt. In this case, we just want the masks.

You’ll notice that I have a preview image here to check that the mask is working. So, I have the same setup down here. We grab the mask output and feed that into the attention mask. I’ve typed in “skirt,” queued it up, and you’ll see how the mask is automatically generated. This means we don’t have to manually paint in the mask.

The reason this appears to have worked is because the reference image used for the sanity check closely resembles the final image we desire. The ControlNet ensures that although the sanity check is close to the desired output, the colors and textures of the shirt and skirt are incorrect. Utilizing the IP adapter and these reference images brings us much closer to our goal.

Why Use This Technique?

If you watched the previous youtube video, you might wonder why this technique is preferred over the previous one, where multiple images were combined into embeddings and fed through. In experimentation, the previous approach did not behave properly with ControlNet. So, if ControlNet is being used, this is a better approach. Depending on the use case, this may or may not be suitable for e-commerce. The results vary depending on the reference image used. As seen here, the quality of the input image matters, as well as the pose provided. The simpler the pose, the better the results. The more complex the pose, the more challenging the results.