Free-Lunch Color-Texture Disentanglement For Stylized Image Generation

[Code Coming Soon] [Paper]

Abstract

Recent advances in Text-to-Image (T2I) diffusion models have transformed image generation, enabling significant progress in stylized generation using only a few style reference images. However, current diffusion-based methods struggle with fine-grained style customization due to challenges in controlling multiple style attributes, such as color and texture. This paper introduces the first tuning-free approach to achieve free-lunch color-texture disentanglement in stylized T2I generation, addressing the need for independently controlled style elements for the Disentangled Stylized Image Generation (DisIG) problem. Our approach leverages the Image-Prompt Additivity property in the CLIP image embedding space to develop techniques for separating and extracting Color-Texture Embeddings (CTE) from individual color and texture reference images. Through these methods, our Style Attributes Disentanglement approach (SADis) delivers a more precise and customizable solution for stylized image generation. Experiments on images from the WikiArt and StyleDrop datasets demonstrate that, both qualitatively and quantitatively, SADis surpasses state-of-the-art stylization methods in the DisIG task.

Method

Framework of SADis

Framework of SADis

Results

Stylized images generated by SADis

Stylized images generated by SADis. (Up) As shown in the first two rows, it enables disentangled control over color and texture attributes in text-to-image diffusion models using separate image prompts. This approach offers creators enhanced color control, including the use of color palettes as in the last two columns. (Down) SADis also enables real-image stylization by incorporating a content image as an additional condition via ControlNet. Furthermore, it extends to color-only stylized generation and material transfer for more flexible image generation.

More Results of Color-texture Disentangled Stylization

More Results

BibTeX

@misc{qin2025freelunchcolortexturedisentanglementstylized,
  title={Free-Lunch Color-Texture Disentanglement for Stylized Image Generation}, 
  author={Jiang Qin and Senmao Li and Alexandra Gomez-Villa and Shiqi Yang and Yaxing Wang and Kai Wang and Joost van de Weijer},
  year={2025},
  eprint={2503.14275},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2503.14275}, 
}