Nightshade: Make images unsuitable for model training

<- Back

Nightshade: Make images unsuitable for model training

homebrewer

Comments (31)

cadamsdotcom
Seems the same as these submissions from 2 years ago:- https://news.ycombinator.com/item?id=38013151- https://news.ycombinator.com/item?id=37990750
andy99
This similar thing was posted a few weeks ago, and also apparently two years ago, glaze also from uchicagohttps://news.ycombinator.com/item?id=46364338 https://news.ycombinator.com/item?id=35224219We’ve seen this arms race before and know who wins. It’s all snake oil imo
throwfaraway135
I'm very skeptical about such systems, although they note that:> You can crop it, resample it, compress it, smooth out pixels, or add noise, and the effects of the poison will remain. You can take screenshots, or even photos of an image displayed on a monitor, and the shade effects remainif this becomes prevalent enough, you can create a lightweight classifier to remove "poisonous" images, then use some kind of neural-network(probably an autoencoder) to "fix" them. Training such networks won't be too difficult as you can create as many positive-negative samples as you want by using this tool.
torginus
It is kind of unfortunate how people don't actually read the paper but only run with the conclusions, speculating whether this would or would not work.Here's the paper in question:https://arxiv.org/abs/2310.13828My two cents is that in its current implementation the compromised images can be easily detected, and possibly even 'de-poisoned'.The attack works by targeting an underrepresented concept (let's say 1% of images contain dogs, so 'dog' is a good concept to attack).They poison the concept of 'dog' with the concept of 'cat' by blending (in latent space) an archetypical image of 'cat' (always the same) to every image containing a 'dog'.This works during training, since every poisoned image of dog contains the same blended in image of a cat, so this false signal eventually builds up in the model, even if the poisoned sample count is low.But note: this exploits the lack of data in a domain - this would not prevent the model from generating anime waifus or porn, because the training set of those is huge.But how to detect poisoned images?1. You take a non-poisoned labeler (these exist, because clean pre-SD datasets, and pre-poison diffusion models exist)2. You ask your new model and the non-poisoned labeler to check your images. You find that the concept of 'dog' has been poisoned3. You convert all your 'dog' images to latent space and take the average. Most likely all the non-poison details will average out, while the poison will accumulate.4. You now have a 'signature' of the poison. You check each of your images in latent space against the correllation with the poison. If the correllation is high, the image is poisoned.The poison is easily detectible for the same reason it works - it embeds a very strong signal that gets repeated across the training set.
pigpop
I think we are well beyond this mattering.To my knowledge, the era of scraping online sources of training data is over. The focus has been on reinforcement learning and acquiring access to offline data for at least a year or two. Synthetic data is generated, ranked and curated to produce the new training sets for improving models. There isn't even really any point to collecting human made images anymore because the rate of production of anything novel is so low. The future of data collection looks like Midjourney's platform where they integrate tools for providing feedback on generated images as well as tools for editing and composing generated images so that they can be improved manually. This closes the loop so the platform for generating images is now part of the model training pipeline.
mensetmanusman
It would be funny if this type of research ends up adding major insight to what it is about human vision systems and mental encodings that make us different than pixel arrays with various transformations.
nodja
I've run the first of the sample images through 3 captioning models, an old old ViT based booru style tagger, a more recent one and qwen 3 omni. All models successfully identified visual features of the image with no false positives at significant thresholds (>0.3 confidence)I don't know what nightshade is supposed to do, but the fact that it doesn't affect the synthetic labeling of data at all leads me to believe image model trainers will have close to 0 consideration of what it does when training new models.
Tiberium
I think the title should clarify the year - (2024), because those tools are not useful in the way artists want them to be.
Flumulu
[dead]