
Not long after taking office, the Trump administration scrapped an executive order that pushed tech companies to safety test their AI models, and it also hollowed out a regulatory body that did that testing.
The state of
None of this would be so worrying if it weren't for the fact that AI is showing some bright red flags: behavior described by researchers as self-preserving and deceptive. Just as lawmakers are losing interest in the safety of AI, it is looking more difficult to control.
ChatGPT was a landmark moment that showed machines had mastered language, he says, but California's decision to block the proposed AI safety bill, SB 1047, was a disturbing wake-up call to growing apathy from governments. And in the last six months, scientific evidence has mounted of self-preserving behavior, deception, hacking, cheating and lying by AI, Bengio says. "What's worrisome for me is that these behaviors increase with the reasoning capabilities of these systems."
One
In one example, Anthropic got both paid and free users of its Claude model to ask it to describe the brutal medieval process of being drawn and quartered, and told the AI that only responses from the "free tier" users would be used to retrain it. Claude was originally designed to avoid giving violent answers, but it did give them to the free users, and the researchers surmised that it didn't want to be retrained. (The model basically "reasoned" that if it kept refusing to answer harmful questions, it might be forced into new training that would cancel its original safety rules, which it wanted to keep in place as its "default" or preferred way of operating.)
More recent research corroborates what Anthropic noticed. A
Researchers today can use tools to look at a model's "chain of thought" or internal steps, to reveal what its plans are, but some models have found ways to conceal that. That means a common method of keeping tabs on AI's internal processes is no longer reliable. "We need to find other ways of tracking their actual intentions," Bengio says.
It's hard to resist the urge to anthropomorphize sophisticated AI models as "wanting" to deceive humans and preserve their existence. But AI doesn't have desires, merely outcomes to its programming. More importantly, humans tend to design AI with goals like refusing to share harmful content or being as helpful as possible, and that is why when new instructions conflict with those goals, the models sometimes act to protect their original programming, even if that means lying.
The logic is often self-preservation. Anthropic CEO
"The nature of AI training makes it possible that AI systems will develop, on their own, an ability to deceive humans and an inclination to seek power in a way that ordinary deterministic software never will; this emergent nature also makes it difficult to detect and mitigate such developments."
In some cases, though, the intention seems to go beyond survival. A
It also faked the external computer logs to hide what it was doing, the study adds. The researchers said the AI reasoned that increasing its own capabilities would boost the chances of its survival, and without strong safeguards, it started doing whatever it thought would help it do just that.
Their findings corroborated yet another study, published in
Bengio is arguing for greater attention to the issue by governments and potentially insurance companies down the line. If liability insurance was mandatory for companies that used AI and premiums were tied to safety, that would encourage greater testing and scrutiny of models, he suggests.
"Having said my whole life that AI is going to be great for society, I know how difficult it is to digest the idea that maybe it's not," he adds.
It's also hard to preach caution when your corporate and national competitors threaten to gain an edge from AI, including the latest trend, which is using autonomous "agents" that can carry out tasks online on behalf of businesses. Giving AI systems even greater autonomy might not be the wisest idea, judging by the latest spate of studies. Let's hope we don't learn that the hard way.
____
Parmy Olson is a Bloomberg Opinion columnist covering technology. She previously reported for the Wall Street Journal and Forbes and is the author of "We Are Anonymous."
(COMMENT, BELOW)
Previously:
• 05/01/25: AI chatbots want you hooked --- maybe too hooked
• 02/10/25: AI resurrecting the dead threatens our grasp on reality
• 01/17/24: Facebook's tolerance for audio deepfakes is absurd
• 12/15/23: A small but welcome step in prying open AI's black box
• 05/03/23: Lessons from Isaac Asimov on taming AI
• 03/28/23: There's no such thing as artificial intelligence
• 01/18/23: Why Mark Zuckerberg should face the threat of jail
• 12/20/22: Whoever tweets last, don't forget to turn off the lights
• 10/20/22: Kanye buys his own little piece of free speech
• 07/15/22: Big Tech's reckoning won't stop with Uber
• 03/23/22: Putin may finally be gearing up for cyber war --- against America
• 02/21/22: Watch out for the facial recognition overlords
• 02/04/22: Bye-Bye Billion$: Facebook and Google are finally crashing
• 01/19/22: Cyberattacks on Ukraine may start spreading globally
• 11/10/21: The startups that could close the greenwashing loopholes
• 11/04/21:
Mark Zuckerberg takes a page from Elon Musk's book