Qwen-Image-Layered: transparency and layer aware open diffusion model

<- Back

Qwen-Image-Layered: transparency and layer aware open diffusion model

dvrp

Comments (20)

dvrp
Qwen-Image-Layered is a diffusion model that, unlike most SOTA-ish models out there (e.g. Flux, Krea 1, ChatGPT, Qwen-Image) it's (1) open-weight (unlike ChatGPT Image or Nano Banana) and Apache 2.0; and has 2 distinct inference-time features: (i) it's able to understand the alpha channel of images (RGBA, as opposed to RGB only) which makes it able to generate transparency-aware bitmaps; and (ii), it's able to understand layers [1]—this is how most creative professionals work in software like Photoshop or Figma, where you overlay elements into a single file, such as a foreground and a background.This is the first model by a main AI research lab (the people behind Qwen Image, which is basically the SOTA open image diffusion model) with those capabilities afaik.The difference in timing for this submission (16 hours ago) is because that's when the research/academic paper got released—as opposed to the inference code and model weights, which just got released 5 hours ago.---Technically there's another difference, but this mostly matters for people who are interested in AI research or AI training. From their abstract: “[we introduce] a Multi-stage Training strategy to adapt a pretrained image generation model into a multilayer image decomposer.” which seems to imply that you can adapt a current (but different) image model to understand layers as well, as well as a pipeline to obtain the data from Photoshop .PSD files.
joshstrange
One of the most valuable things about code generation from LLMs is the ability to edit it, you have all the pieces and can tweak them after the fact. Same with normal generated text. Images, on the other hand, are much harder to modify and the times when you might want text or other “layers” is specifically where they fall apart in my experience. You might get exactly the person/place/thing rendered but the additions to the image aren’t right but it’s nearly impossible to change just the additions without losing at least some of the other image/images.I’ve often thought “I wish I could describe what I want in Pixelmator and have it create a whole document with multiple layers that I can go back in and tweak as needed”.
dvrp
See also:- Paper page: https://huggingface.co/papers/2512.15603- Model page: https://huggingface.co/Qwen/Qwen-Image-Layered- Quantized model page: https://huggingface.co/QuantStack/Qwen-Image-Layered-GGUF- Blog URL: https://qwenlm.github.io/blog/qwen-image-layered/ (404 at the time of writing this comment, but it'll probably release soon)- GitHub page: https://github.com/QwenLM/Qwen-Image-Layered
Alifatisk
It's incredible how much the Qwen team is pushing out in this field
firenode
any workflow on this? Civitai workflow doesn't work.
ThrowawayTestr
Anyone have a good workflow for combining images in comfyui? I could never get it to work.
SV_BubbleTime
I’m still not clear if it’s going to deliver the unique layers to you?If you set a variable layers of 5 for example will it determine what is on each layer, or do I need to prompt that?And I assume you need enough VRAM because each layer will be effectively a whole image in pixel or latent space… so if I have a 1MP image, and 5 layers I would likely need to be able to fit a 5MP image in VRAM?Or if this can be multiple steps, where I wouldn’t need all 5 layers in active VRAM, that the assembly is another step at the end after generating on one layer?
BimJeam
Woah. This is gross. Need to test that.