Amazon experienced four high-severity incidents that caused outages or degraded performance of critical systems. At least one them was traced to an engineer following bad advice from an AI agent.
An internal memo originally blamed "GenAI-assisted changes," then Amazon deleted the reference after it leaked. A separate investigation found engineers being pressured to hit 80% AI adoption targets while the tools they're required to use frequently hallucinate.
When adoption outpaces governance, the failures are predictable.
Amazon experienced four serious incidents in a single week that caused outages or degraded performance of critical systems, a company executive said. One of them included a six-hour outage of its checkout system that prevented customers from completing purchases via the Amazon app or website. The incidents were serious enough that SVP of eCommerce Services, Dave Treadwell, called a mandatory "deep dive" meeting with retail tech teams. Amazon rarely takes that step outside of major product launches.
At least one of the incidents was traced to a new kind of failure: A human engineer consulted an AI agent for guidance on how to resolve a system issue. The agent referenced an outdated internal wiki or collaborative website and provided inaccurate troubleshooting advice. The engineer followed its guidance, which triggered a chain of cascading failures across interconnected systems.
An internal briefing note originally cited "GenAI-assisted changes" and "novel GenAI usage for which best practices and safeguards are not yet fully established" as contributing factors to a bunch of incidents. Amazon later deleted the GenAI reference from the memo after it leaked to the press. The company's official position: AI-written code was not to blame. The issue was a human acting on bad AI-generated guidance.
A separate Guardian report found the problem runs deeper. Amazon engineers are being heavily pressured by management to use internal AI coding tools. Engineers report the tools frequently hallucinate and generate unreliable code, forcing them to spend more time fixing AI mistakes than if they had written the code themselves. Amazon is actively tracking AI adoption through internal dashboards, with some managers setting goals of 80% team adoption. Promotion documents now explicitly ask employees how they have leveraged AI.
This follows an incident in December where an AI coding tool autonomously deleted and recreated an AWS coding environment. That outage lasted 13 hours.
On Episode 203 of The Artificial Intelligence Show, SmarterX founder and CEO Paul Roetzer broke down what this all means for enterprise AI adoption.
4 - High-severity incidents in a single week
6 - Hours of checkout downtime from one incident
80% - Team AI adoption targets set by some Amazon managers
13 - Hours of downtime from a December AWS incident in which an AI tool deleted and recreated a production environment
~30,000 - Amazon employees laid off
The distinction Amazon is drawing is thinner than it sounds. Amazon says AI-written code was not the problem. The problem was a human acting on bad AI-generated guidance. "Kind of a gray line to draw," says Roetzer. The human made the decision. But the human made it based on information from an AI agent that was confidently wrong. Separating "AI wrote bad code" from "AI gave a human bad advice that broke everything" does not change the outcome.
The push-pull cycle is already repeating. Roetzer sees this playing out across enterprise AI adoption.
"I just feel there's going to be a lot of blowback with stuff like this. I just think most organizations are racing ahead trying to gain the efficiencies of all of this. And then there's this pullback."
—Paul Roetzer, founder and CEO of SmarterX
He points to Klarna as the clearest example. The fintech company publicly declared it no longer needed customer success staff, cut headcount aggressively, then reversed course. "A year later it's, 'Oh, wait a second, we actually really needed them and we're going to hire them back,'" Roetzer says.
The layoffs create the vulnerability. Amazon laid off roughly 30,000 people, presumably because AI is taking the place of human workers. "You're going to have these massive layoffs where this assumption that AI is far enough along where we just don't need all these people," Roetzer says. "And then the reality hits: 'Oh wait, we don't have the systems in place. We didn't go through the change management necessary to actually allow this to happen.'"
A parody captured the causal chain. Peter Girnus, a senior threat researcher at the Zero Day Initiative who writes satirical AI commentary on X, published a parody post as a fictional VP of AI Transformation at Amazon. The scenario: 16,000 engineers laid off. An AI coding assistant deployed in their place. The AI is given access to production environments because the review phase was cut. The review phase was cut because the people who would have conducted it were part of the 16,000.
"Like any parody, it kind of hurts," Roetzer says. "It has an element of truth to it."
Amazon's four outages in one week are not an indictment of AI. They are a preview of what happens when adoption outpaces governance. When 80% adoption targets get set without literacy programs to back them up. When engineers are pressured to use tools that hallucinate, and promotion documents reward AI usage regardless of whether it produces better outcomes.
The instinct at most companies right now is to cut headcount and let AI fill the gap. The Amazon situation suggests the opposite approach: Take the engineers you are considering laying off and redeploy them to vet AI workflows, build guardrails, and update the internal knowledge bases that AI agents rely on. These are the people with deep institutional knowledge. Their job would be to find exactly the kind of failure that took Amazon's checkout offline for six hours. But the incentive structure works against reallocation.
"Job loss is largely going to be an organization-by-organization choice," Roetzer says. "If you're having great revenue growth and your profits are increasing and revenue per employee is increasing, you can choose to reallocate people."
The problem: Wall Street rewards headcount reduction with higher stock prices.
"Your stock price isn't going to see the bump from reallocation of talent."
—Paul Roetzer, founder and CEO of SmarterX
Amazon now requires junior and mid-level engineers to get senior engineer sign-off on all AI-assisted changes. That is a governance correction, not a rollback. The question is whether other companies learn from this before their own version of the same failure, or whether the push-pull cycle Roetzer describes plays out across the enterprise landscape.
Amazon Convenes 'Deep Dive' Internal Meeting to Address AI-Related Outages → cnbc.com
Amazon Holds Engineering Meeting Following AI-Related Outages → ft.com
Amazon Is Determined to Use AI for Everything → theguardian.com
Amazon Puts Humans Further Back in the Loop After AI Agent Crashes Retail Site → fortune.com
Peter Girnus (@gothburz) Satirical Post → x.com
Heard on The Artificial Intelligence Show, Episode 203
Paul Roetzer and Mike Kaput discuss Amazon's AI-related outages, the pattern of companies cutting too fast then pulling back, and why reallocation beats elimination. Listen →