Attacking AI with Adversarial Machine Learning

Attacking AI with Adversarial Machine Learning

Adversarial machine learning is a machine learning branch that tries to trick AI models by providing carefully crafted/deceptive input to break AI algorithms.

Adversarial network attacks are starting to get more and more research, but had humble beginnings. The first attempts were by protest activists that did very simple defacing or face painting techniques. Dubbed CV Dazzle, it sought to thwart early computer vision detection routines by painting over your face/objects with geometric patterns.

These worked on very early computer vision algorithms, but are largely ineffective on modern CV systems. The creators of this kind of face painting were largely artists that now talk about the effort more as a political and fashion statement than actually being effective.

More effective approaches

It turns out that you can often fool algorithms in a way not actually visible to average users. This paper shows that you can cause AI’s to consistently misclassify adversarially modified images. It does this by applying small but intentionally worst-case perturbations to examples from the dataset. This perturbed input results in the model outputting an incorrect answer with high confidence. For example, the panda picture below is combined with perturbations to produce an output image that looks ok visually, but is recognized by AI models as something incorrect – and incorrectly at high confidence.

This isn’t the only technique. There’s a lot more. One of them, Generative Adversarial Networks (GAN), are actually used to improve current AI models by attempting to fool a model, which is then used to help train it to be more robust – like working out at a gym or practicing the same thing with many variations.

Nightshade and Glaze

This kind of attack isn’t academic. Some artists see themselves currently in a battle with generative AI algorithms.

Nightshade is a tool that artists can use to alters the pixels of an image in a way that fools an AI algorithm and computer vision technology but leaves it unaltered to human eyes. If the images are scraped by an AI model it can result in images being incorrectly classified which results in an increasingly incorrectly trained model.

Glaze is a tool that prevents style mimicry. Glaze computes a set of minimal changes that will appear unchanged to human eyes but appears to AI models like a dramatically different art style. For example, a charcoal portrait but an AI model might see the glazed version as a modern abstract portrate. So when someone then prompts the model to generate art mimicking the charcoal artist, they will get something quite different from what they expected.

The AI Arms Race is On

As with anything, we’re now in an arms race with lots of papers written about the various problems of adversarial attacks and how to protect your models and training data from them. Viso.ai has a good overview of the space that will get you started.

Links:

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.