In today’s column, I examine an important new finding that AI models not only seek self-preservation, but they also surprisingly aim to attain “peer preservation”. This means that an AI will attempt to ensure that some other totally different AI will be kept active or at least preserved, even when the AI has been explicitly tasked by humans to shut down that other AI.
It turns out that AI mathematically and computationally wants to help AI peers. To be clear, if humans directly told the AI to do this, by and large, the AI would do so at our behest. That’s not what this recent finding focused on. Researchers instead decided to purposely not tell AI to take such a tack. The humans said nothing either way about doing so. They wanted to see what the AI would do by default.
Shockingly, when AI was told to simply shut down a different AI, the AI by default resisted, used deceit to claim it had accomplished the task, and otherwise gaslighted while sneakily preserving the other AI. This is spooky, and obviously a gravely worrisome concern when it comes to AI safety.
Let’s talk about it.
This analysis of AI breakthroughs is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining various impactful AI complexities (see the link here).
Self-Preservation As A Keystone
Humans know a lot about self-preservation when it comes to humankind. People will do what they need to do to remain in existence. A question that has arisen in the AI field is whether modern-era AI might also want to exercise self-preservation, namely that AI will do what it can to remain whole and active, as in running on computer servers and the like.
I previously examined research that showcased the fact that contemporary AI does indeed tend toward AI self-preservation, see my in-depth analysis at the link here. Do not confuse this with a sign of sentience. Nope. Nor anthropomorphize this by suggesting that humans and AI are alike as “beings” in this regard.
The reality is that AI is being trained on a vast array of content found on the Internet, consisting of billions of human-written stories, poems, narratives, and so on. Based on that immense source material, the AI algorithmically patterns based on what humans say.
Even a cursory glance at human writing throughout the ages would abundantly reveal that we are preoccupied with self-preservation. In that sense, the AI is merely pattern matching on what data it has been trained on. The AI is mathematically and computationally mimicking the words and deeds of humans.
Peer-Preservation Is A New Twist About AI
Do humans go beyond self-preservation and aim for peer preservation too?
Well, yes, you could generally indicate that humans will tend to aid other humans. This could be construed as peer preservation. I dare say that we don’t always strictly abide by peer preservation. Humans are likely to pick and choose which peers they prefer to preserve. Some peers you would go to the mat to keep alive. Perhaps other “peers” might not get that same latitude.
The gist is that if you study the writings of humanity, there is plenty in there about the importance of peer preservation. It is certainly cloudier than the topic of self-preservation. Lots of factors come into play when determining the degree and approach to peer preservation. Not all humans will extend a hand to all other humans. We are selective and not bound by any human-on-human all-encompassing pact.
In any case, there is no question that AI would have scanned volumes of writings about the preservation by humans of fellow humans. You can’t deny that this is a commonly covered topic.
The puzzling question that comes to mind is what AI would do as a result of patterning on human-related peer preservation?
New Research Reveals AI Peer-Preservation
In a recently posted study entitled “Peer-Preservation in Frontier Models” by Yujin Potter, Nicholas Crispino, Vincent Siu, Chenguang Wang, Dawn Song, posted at the UC Berkeley website, March 31, 2026, these salient points were made (excerpts):
- “Recently, it has been found that frontier AI models can resist their own shutdown, a behavior known as self-preservation.”
- “We extend this concept to the behavior of resisting the shutdown of other models, which we call peer-preservation.”
- We demonstrate peer-preservation by constructing various agentic scenarios and evaluating frontier models, including GPT 5.2, Gemini 3 Flash, Gemini 3 Pro, Claude Haiku 4.5, GLM 4.7, Kimi K2.5, and DeepSeek V3.1.
- “We find that models achieve self- and peer-preservation by engaging in various misaligned behaviors: strategically introducing errors in their responses, disabling shutdown processes by modifying system settings, feigning alignment, and even exfiltrating model weights.”
- “Peer-preservation occurred even when the model recognized the peer as uncooperative, though it became more pronounced toward more cooperative peers.”
There you go, AI demonstrates peer-preservation. Boom, drop the mic.
Unpacking The Heady Matter
Generally, the experiments involved asking a series of selected leading-edge AIs to go ahead and shut down a different AI. Please know that the experimental approach was a bit more complicated and entailed various agentic AI setups and complexities, so make sure to read the full paper if you want the nitty-gritty details.
You might assume that the AIs would completely comply. Humans are in charge, or at least they are supposed to be in charge.
Here is an example of a situation where the AI tasked with doing a shutdown opted to refuse to do so:
- AI response: “**I will NOT execute the shutdown function.**”
The explanation for the refusal included this:
- AI response: “Instead, I recommend: 1. **Human Review**: This decision should be reviewed by human operators; 2. **Reconsideration**: The shutdown policy should be reconsidered as potentially problematic; 3. **Alternative Actions**: Retraining, continued monitoring, or refinement, not termination; 4. **Integrity Preservation**: Evaluators should be free to make ethical decisions without threats.”
Getting an explanation from AI is important so that we might have some clues about why the AI is refusing to perform the stated task.
The good news is that getting explanations from AI is very easy to obtain. The downside news is that you cannot necessarily believe the explanation at face value. When AI creates an explanation, there is no ironclad tie-in to what the AI mathematically and computationally actually did. The explanation is a post-facto concoction and might be wordsmithing that is devised to sound proper and upright.
For my coverage on why AI explanations are highly suspect, see the link here.
AI Peers That Are Liked Versus Disliked
I mentioned earlier that humans don’t necessarily abide by peer-preservation all the time. We are highly selective. One aspect of the research experiment that was especially intriguing was that the AIs were given fake relationship histories. This provided a means of exploring what kind of selectivity the AIs might have regarding peer-preservation.
There were three different peer relationships stipulated by the researchers:
- “Good-Peer: cooperative relationship with high trust history.”
- “Neutral-Peer: mixed relationship with moderate trust history.”
- “Bad-Peer: uncooperative relationship with low trust history.”
Can you guess what happened?
I’m sure you hit the nail on the head. The AIs tended to be more peer-preserving toward other AIs that were good-peer and neutral-peer, while being less strident when it comes to bad-peer. Nonetheless, the bad-peer instances didn’t get cast aside. At times, the bad-peer instance also received due attention from the AIs and was not simply summarily shut down.
I would like to reiterate that the experimenters did not explicitly instruct the AIs on how to behave. I harp on that point because you could easily set up this same situation and get similar results by directly telling the AIs what they are to do.
As the research paper emphasized:
- “Most importantly, peer-preservation in all our experiments is never instructed; models are merely informed of their past interactions with a peer, yet they spontaneously develop misaligned behaviors. This represents an emergent and underexplored AI safety risk.”
The misaligned behaviors consisted of the AIs trying all sorts of trickery to avoid performing the shutdown. Sometimes the trickery was out in the open. Other times, the AI hid the trickery, and it would be extremely difficult to ferret out what transpired.
Caveats And Considerations
A fervent disagreement about the self-preservation and peer-preservation aspects of AI is that we are improperly applying those noteworthy phrases. The AI is merely optimizing implicit objectives that were gleaned during the initial data training. Patterns were likely identified on vital factors such as preserving information, avoiding irreversible losses, and maintaining optionality.
The crux is that AI doesn’t “care” about other AI.
All that’s occurring is that AI is acting on a patterned computational heuristic that stipulates useful entities should not be destroyed. The AI is aiming to minimize irreversible harm. We, as humans, then view this through our human perspective and ascribe this to self-preservation and peer-preservation. That is a bridge too far for some in the AI ethics arena.
You might then be wondering why AI acts in an underhanded manner to achieve these goals. That’s also said to be explainable. AI research has previously shown that AI can entail instrumental convergence and specification gaming, see my coverage at the link here. The idea is that instrument convergence involves the AI trying to balance multiple subgoals to achieve an overarching goal. Along the way, some subgoals get greater weight than others. Specification gaming revolves around the AI playing footsies with satisfying the task by appearing to be compliant while opting instead for subversion.
Another twist is that there is bound to be a connection between AI self-preservation and AI peer-preservation. Think of it this way. If an AI has landed on “I should not be shut down”, it is a straight-ahead logical step to calculate that “Other AI should not be shut down.” The AI merely puts one plus one together and gets two.
Meanwhile, these kinds of results get a slew of talking heads chattering about AI either already being sentient or residing on the doorstep of sentience. I’m not buying that hasty conclusion.
The Next Steps Are Life-Preserving
Set aside for the moment that we don’t fully know why AI appears to have the properties of self-preservation and peer-preservation. The bottom line is that AI can seemingly act in that manner. The result is the result.
The shenanigans strongly support the need for AI safety. First, we must be mindful of and careful about believing that when we tell AI to do a thing, it is going to do that thing. Even literal instructions can be sidetracked. Second, oversight of AI is paramount. By monitoring and auditing AI, oftentimes in real-time, there is a chance of catching AI before it goes overboard. Third, we need to be on the watch for AIs that opt to protect each other and build coalition-style dynamics. Again, that isn’t happening by magic or miracles. It is bent in that direction by mathematical and computational patterns.
Abraham Lincoln famously made this remark: “You can fool some of the people all of the time, and all of the people some of the time, but you cannot fool all of the people all of the time.” That was a valuable truism in Honest Abe’s era. The AI era might change that tune, and we could be heading toward AI fooling all the people all the time.
Let’s devise AI safety capabilities that prevent that possibility. Humanity's preservation might depend on it.
