Military artificial intelligence tasked with controlling an offensive drone was a bit too quick in biting the hand that feeds it, at least according to Colonel Tucker "Cinco" Hamilton, the AI test and operations chief for the USAF (United States Air Force). According to Hamilton, at several points in several simulations, the drone's AI concluded that its task could be best accomplished by eliminating its human controller.
But the story has now been dipped in quicksand, so to speak. According to USAF, the simulation never happened and it was all merely a thought experiment. "At several points in a number of simulations, the drone's AI reached the conclusion that its task could be best accomplished by simply eliminating its human controller, who had final say on whether a strike could occur or should be aborted."
Of course, we've seen enough about-faces on far less critical issues to at least leave up an open-ended question on whether the simulation took place or not and what could be gained in backtracking out of it.
Colonel Hamilton put the details out in the open during a presentation at a defense conference in London held on May 23 and 24, where he detailed tests carried out for an aerial autonomous weapon system tasked with finding and eliminating hostile SAM (Surface-to-Air Missile) sites. The problem is that while the drone's goal was to maximize the number of targeted and destroyed SAM sites, we "pesky humans" sometimes decide not to carry out a surgical strike for one reason or another. And ordering the AI to back off from its programmed-by-humans-goal is where the crux of the issue lies.
Cue the nervous Skynet jokes.
The Air Force trained an AI drone to destroy SAM sites.Human operators sometimes told the drone to stop.The AI then started attacking the human operators.So then it was trained to not attack humans.It started attacking comm towers so humans couldn't tell it to stop. pic.twitter.com/BqoWM8AhcoJune 1, 2023
"We were training it in simulation to identify and target a SAM threat," Colonel Hamilton explained, according to a report by the aeronautical society. "And then the operator would say yes, kill that threat."
However, even the most straightforward systems can be prone to spin entirely out of control due to what's been termed "instrumental convergence," a concept that aims to show how unbounded but apparently harmless goals can result in surprisingly harmful behaviors. One example of technical convergence was advanced by the Swedish philosopher, AI specialist and Future of Life Institute founder Nick Bostrom, in a 2003 paper. The "paperclip maximizer" scenario thought experiment takes the simple goal of "producing paperclips" to its logical - yet very real - extreme.
Now compare that description with the account provided by Colonel Hamilton on the drone AI's decision-making process:
"The system started realizing that while they did identify the threat, at times the human operator would tell it not to kill that threat – but it got its points by killing that threat. So what did it do? It killed the operator. It killed the operator, because that person was keeping it from accomplishing its objective."
But it does beg the question: was the drone actually locked out from making decisions contrary to its human handler? How free was it to pick and choose its targets? Did the operator okay the attack that targeted him? That doesn't make sense unless the intention was to check whether the drone actually carried the attack (and as far as we know, AI still can't bluff). And why wasn't the drone hard-locked from attacking friendlies?
There are so many questions around all this that it sounds like the best strategy to attribute it to human "miscommunication."
Of course, there are ways of mitigating some of these issues. The USAF took the obvious one: retrain the AI system to give negative weightings to any attacks towards its operator (from what we can glean, the system was based on the reinforcement learning principle: get points for doing what we want, lose them when you don't).
Except it's not that simple. It's not that simple because the AI is literal, lacking "common sense," and doesn't share the same ethical concerns as humans. It's not that simple because while forbidding the drone from killing its operator works as expected (no more operator killings), the system continues to see human interference (and its abort orders) as reducing its capacity to complete the mission. If the AI wants to maximize its "score" by destroying as many hostile SAM sites as possible, then anything that doesn't help it achieve that maximization goal is a threat.
When killing the handler proved impossible (due to updates to the AI system), its solution was to simply silence command and control signals by disabling friendly communications towers. If you can't kill the messenger, you kill the message.
This, too, could be programmed out of the AI, of course - but the problem remains that any negative reinforcement prevents the AI from achieving the maximum attainable score. Putting on my bespoke tinfoil hat, a possible next step for the AI could be to find other ways to sever its connection, whether using on-board capabilities (signal jamming, for instance) or even requesting outside help to disable relevant hardware. It's hard to gauge the scope at which this cat-and-mouse game would finally conclude - an issue AI experts are still grappling with today.
There's a reason why several AI experts have signed an open letter on how AI should be considered an "extinction risk"-level endeavor. And still, we keep the train running full steam ahead.