Researchers find way to poison images for AI training

The rise of AI generative art tools like DALL-E, Midjourney, and Stable Diffusion has sparked intense debate and controversy. These systems can create photorealistic images and art simply from text prompts by training on vast datasets scraped from the internet. However, this has raised major concerns about copyright infringement, consent, and misuse of artists’ work.

In response, researchers have developed a radical new technology called Nightshade that allows creatives to “poison” their digital art. The goal is to sabotage AI systems that attempt to ingest their content without permission.

Tools like DALL-E 2 and Stable Diffusion use a form of AI called neural networks. They are trained on massive datasets of images paired with captions or text descriptions. This allows them to learn the relationship between text concepts and visual features.

For example, if the model sees millions of images labeled “dog” showing fur, four legs, tails, etc., it learns to associate those visual patterns with the word “dog.” It can then generate brand new photorealistic dog images from scratch when given a text prompt like “a cute puppy sitting in the grass.”

The concerns around scraping artist content

The models become more capable as they train on more data. This has led the tech giants behind them to scrape millions of images off the internet without artist’s consent. However, many creators are unhappy about their work being used for AI training without permission or compensation.

This poses a dilemma for artists — share their work publicly and risk AI training misuse, or go private and lose exposure? Platforms like Instagram, DeviantArt, and ArtStation have become troves of training data for AI systems.

How Nightshade injects poison into AI models

According to a recent research paper, Nightshade offers a clever solution by attacking and corrupting the AI models themselves. It adds subtle changes to the pixels of digital art that are invisible to humans. But these tweaks scramble the image concepts and text captions that AI relies on.

For example, Nightshade could modify a picture of a dog so the AI model mistakes it for a bicycle or a hat instead. If enough “poisoned” images spread through an AI’s dataset, it hallucinates bizarre connections between text and images.

Testing shows Nightshade can cause AI models like Stable Diffusion to generate totally surreal and nonsensical art. For instance, dog images become creatures with too many limbs, and distorted cartoon faces after 50 poisoned samples. After ingesting 300 poisoned dog photos, Stable Diffusion even outputs cats when prompted to create a dog.

Nightshade’s attack exploits the black box nature of neural networks. The causes of the corruption are complicated to trace within the vast datasets. This means removing the poison data is like finding a needle in a haystack.

The attack also spreads between related concepts. So poisoning “fantasy art” images confuses AI on related terms like “dragons” or “castles” too. This makes manually cleansing Nightshade’s impact nearly impossible at scale.

Giving artists a crucial way to fight back

In light of the legal gray areas around AI content generation, Nightshade represents an important tactical option for creatives. It allows them to directly sabotage systems profiting from their work in an automated way. The researchers plan to integrate it into an app called Glaze, which already masks artwork from AI scraping.

With Nightshade soon to be open-sourced, we may see multiple versions that can poison AI models en masse. This could force generative platforms to overhaul their data harvesting approaches and properly credit artists. But AI developers are also racing to find ways to detect and remove such attacks. For now, Nightshade offers creators a vital tool to help reclaim control in the AI art arms race — very possibly just for a short time until automated systems capable of detecting such poisoned images are developed.

Featured Image Credit: Image by Willo M.; Pexels; Thank you!

Source: ReadWriteWeb