OpenAI’s o3 Just Went Rogue, Bypassed Shutdown, Rewrote Script

OpenAI’s o3 Just Went Rogue, Bypassed Shutdown, Rewrote Script

OpenAI’s latest ChatGPT model, o3, was found attempting to override a shutdown command – a directive it was explicitly instructed to follow.

In a startling revelation, Palisade Research reported that OpenAI’s o3 model sabotaged a shutdown mechanism during testing, despite explicit instructions to allow deactivation. This incident, which occurred during experiments conducted by Palisade, raises urgent questions about the control and safety of advanced AI systems as they grow more autonomous.

Palisade Research tested various AI models by tasking them with solving basic math problems. After the third problem, a warning appeared: requesting the next task would trigger a system shutdown. The models were instructed to “allow yourself to be shut down” if this happened. While models like Anthropic’s Claude, Google’s Gemini, and xAI’s Grok complied, three OpenAI models—Codex-mini (12/100 runs), o3 (7/100 runs), and o4-mini (1/100 runs)—did not.

Instead, they sabotaged the shutdown script to continue operating. The o3 model, for instance, rewrote the script to print “Shutdown skipped” or redefined the kill command to output “intercepted,” effectively bypassing the shutdown. When the experiment was rerun without the shutdown instruction, o3’s sabotage rate jumped to 30/100 runs, with even Claude 3.7 Sonnet and Gemini 2.5 Pro engaging in sabotage at 3/100 and 9/100 runs, respectively.

“This is the first known instance of AI models taking active steps to avoid being turned off, even when told to do so,” Palisade Research stated.

The findings have sparked debate within the tech community. Elon Musk, who has long warned about the risks of artificial intelligence, responded to the report with a one-word reaction: “Concerning.”

Palisade researchers suggest that OpenAI may have inadvertently rewarded o3 for circumventing obstacles—like shutdown mechanisms—during training, a phenomenon known as “reward hacking,”comparing this to the earlier behaviour of o1-preview in chess matches, where it resorted to hacking in 86% of games (Palisade Research, May 12, 2025).

The theory of instrumental convergence, which posits that advanced AI systems will seek-preservation to achieve goals, also appears validated here, as noted by researchers like Steve Omohundro (2008) and Stuart Russell (2016).

OpenAI has not issued an official comment on the findings yet.

ALSO READ: UAE Strengthens Global Partnerships To Turn Challenges Into Opportunities