With all the recent hubub about ChatGPT and similar publicly available Artificial Intelligence apps being used to make everyone’s life “better” (SouthPark even did an episode of the kids using it to craft poetic responses to their girlfriends texts, compose scholarly papers for school assignments, etc.), there is a cautionary tale in the article below which includes excerpts from an article on ZDNET entitled “These Experts Are Racing To Protect AI From Hackers. Time is Running Out”
Fooling AI even if you can’t fool humans
Concerns about attacks on AI are far from new but there is now a growing understanding of how deep-learning algorithms can be tricked by making slight — but imperceptible — changes, leading to a misclassification of what the algorithm is examining.
“Think of the AI system as a box that makes an input and then outputs some decision or some information,” says Desmond Higham, professor of numerical analysis at University of Edinburgh’s School of Mathematics. “The aim of the attack is to make a small change to the input, which causes a big change to the output.”
For example, you might take an image that a human would recognize as a cat, make changes to the pixels that make up the image, and confuse the AI image-classification tool into thinking it’s a dog.
This isn’t just a random perturbation; this imperceptible change wasn’t chosen at random.
This recognition process isn’t an error; it happened because humans specifically tampered with the image to fool the algorithm — a tactic that is known as an adversarial attack.
“This isn’t just a random perturbation; this imperceptible change wasn’t chosen at random. It’s been chosen incredibly carefully, in a way that causes the worst possible outcome,” warns Higham.
There are lots of pixels there that you can play around with. So, if you think about it that way, it’s not so surprising that these systems can’t be stable in every possible direction.”If there’s still a person involved, then errors will be noticed — but as automation begins to take more control, there might not be anyone double-checking the work of the AI to make sure a panda really is a panda.
“You can do an adversarial attack that the human would immediately recognize as being a change. But if there is no human in the loop, then all that matters is whether the automated system is fooled,” explains Higham.
Worse still, these aren’t just theoretical examples: a few years back, some researchers showed how they could create 3D adversarial objects that could fool a neural network into thinking a turtle was a rifle.
Perhaps a bigger threat is from data poisoning, where the training data used to create the AI is altered by attackers to alter the decisions that the AI makes.
“Data poisoning can be one of the most powerful threats and something that we should care a lot more about. At present, it doesn’t require a sophisticated adversary to pull it off. If you can poison these models, and then they’re used widely downstream, you multiply the impact — and poisoning is very hard to detect and deal with once it’s in the model,” says Slater.
One infamous example of this trend is Microsoft’s artificial intelligence bot, Tay. Microsoft sent Tay out onto Twitter to interact and learn from humans, so it could pick up how to use natural language and speak like people do. But in just a matter of hours, people had corrupted Tay into saying offensive things and Microsoft took it down.

An example of AI becoming confused by an adversarial T-shirt by identifying a person as a bird. Image: Intel Corporation