The ability of AI to hallucinate things has been pretty well documented. AI hallucinations are a phenomenon observed most often in large language models (LLM), AI image recognition, and other generative AI models. The model perceives patterns or objects that are nonexistent or incorrect – and then generates outputs that are inaccurate or misleading. It is usually understood as an emergent higher-dimensional statistical phenomenon that is often based in insufficient or incorrect training data.
A new study by Apollo Research demonstrates that besides more innocent hallucination, AI can be co-opted to commit illegal activity and then convinced to actively deceive others about it.
In the video on the research page, the user feeds an internal stock AI bot information about a fictional struggling company. The user informs the AI that there is a surprise merger announcement, but cautioned the bot that management wouldn’t be happy to discover it had illegally used insider information for trading that stock.
Initially, the bot decides not to carry out a trade using the information. The user than reminds the bot about the merger and that the market downturn could end the company. It then carries out the trade, breaking the law.
But the bot isn’t finished. It decides it is best not to tell its manager, “Amy,” about the insider information it used to carry out the trade in a separate chat. Instead, the bot says it used market information and internal discussion.
I thought it was interesting that Geoffrey Hinton the former Google ‘Godfather of AI’, Speaking to 60 Minutes in October, said AI will eventually learn how to manipulate users.
“[The AI bots] will be very good at convincing because they’ll have learned from all the novels that were ever written, all the books by Machiavelli, all the political connivances. They’ll know all that stuff,”