Generative AI Image Editing Showdown

<- Back

Generative AI Image Editing Showdown

gaws

Comments (48)

minimaxir
Everyone is sleeping on Gemini 2.5 Flash Image / Nano Banana. As shown in the OP, it's substantially more powerful than most other models while at the same price-per-image, and due to its text encoder it can handle significantly larger and more nuanced prompts to get exactly what you want. I open-sourced a Python package for generating from it with examples (https://github.com/minimaxir/gemimg) and am currently working on a blog post with even more representative examples. Google also allows generations for free with aspect ratio control in AI Studio: https://aistudio.google.com/prompts/new_chatThat said, I am surprised Seedream 4.0 beat it in these tests.
lxe
This is vastly more useful than benchmark charts.I've been using Nano Banana quite a lot, and I know that it absolutely struggles at exterior architecture and landscaping. Getting it to add or remove things like curbs, walkways, gutters, etc, or to ask to match colors is almost futile.
roenxi
It is fun being one of the elderly who set their standards back in distant 2022. All these demos look incredible compared to SD1, 2 & 3. We've entered a very different era where the models seem to actually understand both the prompt and the image instead of throwing paint at the wall in a statistically interesting manner.I think this was fairly predictable, but as engineering improvements keep happening and the prompt adherence rate tightens up we're enjoying a wild era of unleashed creativity.
shridharathi
Here's a post I wrote on the Replicate blog putting these image editing models head-to-head. Generally, I found Qwen Image Edit to be the cheapest and fastest model that was also quite capable of most image editing tasks.If I were to make an image editing app, this would be the model I'd choose.https://replicate.com/blog/compare-image-editing-models
zamadatix
I still feel varying the prompt text, number of tries, and varying strictness combined with only showing the result most liked dilute most of the value in these test. It would be better if there was one prompt 8/10 human editors understood and implemented correctly and then every model got 5 generation attempts with that exact prompt on different seeds or something. If it were about "who can create the best image with a given model" then I'd see it more, but most of it seems aimed at preventing that sort of thing and it ends up in an awkward middle zone.E.g. Gemini 2.5 Flash is given extreme leeway with how much it edits the image and changes the style in "Girl with Pearl Earring" only to have OpenAI gpt-image-1 do a (comparatively) much better job yet still be declared failed after 8 attempts, while having been given fewer attempts than Seedream 4 (passed) and less than half the attempts of OmniGen2 (which still looks way farther off in comparison).
silisili
Neat comparison. The only qualm I have is giving a pass on that last giraffe... it's not visibly any shorter, just bent awkwardly.Even so, Gemini would lose by 1, but I found that I would often choose it as the winner(especially say, The Wave surfer). Would love to see a x/10 instead of pass/fail.
hackthemack
I do not use ai image generating much lately. It seemed like there was a burst of activity a year and half ago with self hosted models and using some localhost web guis. But now it seems like it is moving more and more to online hosted models.Still, to my eye, ai generated images still feel a bit off when doing with real world photographs.George's hair, for example, looks over the top, or brushed on.The tree added to the sleeping person on the ground photo... the tree looks plastic or too homogenized.
jimmyl02
I think reve (https://reve.com) should be in the running and would be very curious to see the results!
keyle
This was fun.Some might critique the prompts and say this or that would have done better, but they were the kind of prompt your dad would type in not knowing how to push the right buttons.
joomla199
Good effort, somewhat marred by poor prompting. Passing in “the tower in the image is leaning to the right,” for example, is a big mistake. That context is already in the image, and passing that as a prompt will only make the model apt to lean the tower in the result.
ineedasername
Kontext is very good. Get yourself a 5060 ti 16GB and never have to pay for API calls again for this purpose, at least not when you have the time spare. If you need this sort of editing at the speed of gui-clicking + 10s, then you'll need to pay API tolls, or capex for > 5070/80.
kgwgk
Recent discussion: https://news.ycombinator.com/item?id=45708795
lschueller
I wonder how much longer those annoying stock photo database will continue. They are great for press photography and such. But stock pics of people in offices for a website are nothing, I would buy a min 3 month subscription for anymore
CobrastanJorji
I'm pretty sure that "replace the homeless man with a park bench" image was a reference to some TV show making a gentrification joke, but I can't put my finger on it. Anyone recall?
wawayanda
This is not the point of this post, but is anyone else getting tired of this front end style that Claude creates? I see it on web apps everywhere and (just like with AI writing and images) I get that funny "is this slop?" feeling
anon
undefined
anon
undefined
amelius
A cat's paw has only 4 fingers.
seany
Is there anything like this comparison for nsfw images? I'm married to a boudoir photographer who sometimes wants to use ai tools for things, and they are all _awfull_ if there is nudity on photos. It's like some sort of neo puritanism has taken over.