Security

A history of AWS cloud and data center outages

11 min read
A history of AWS cloud and data center outages
blog author
László Kovács

Content Manager, SpaceLama.com

One day, millions worldwide lost access to their favorite websites. Online payments stalled, apps froze, and smart speakers went mute, ignoring voice commands, and even smart mattresses and coffee pots ceased to work properly. People couldn’t use their phones, pay their bills, couldn’t sleep or cook food. What seemed like a minor glitch quickly revealed itself as a significant problem. When Amazon Web Services went down, it wasn’t just a single website that crashed. It disrupted a vital part of the modern internet.

AWS powers everything from Netflix and Spotify to banks, startups, retailers, and healthcare platforms. Each time AWS stumbles, we’re reminded that the internet isn’t some abstract cloud. It’s a network of data centers, vulnerable to errors and overloads like everything else built by humans. The rise of AI coding and drastic cuts to the human workforce have further weakened the safeguarding systems, leading to an increase in errors.

This is the story of AWS outages and how they shook the digital landscape.

What is AWS and why does the entire internet depend on it?

AWS (Amazon Web Services) is the world’s largest cloud platform, enabling companies to rent computing power, storage, and other resources rather than buying and maintaining their own servers. In simple terms, AWS helps businesses launch websites, store data, process requests, and build applications in the cloud without the hassle of managing massive infrastructure.

AWS organizes its global infrastructure into regions, such as the US, Europe, or Asia, each containing several independent Availability Zones. This structure enhances reliability. If one zone fails, others automatically take over, ensuring websites and applications stay online even during hiccups.

AWS offers dozens of services that can be mixed and matched to create nearly any digital system:

  • EC2. Virtual servers that can be powered on or off as needed.
  • S3. Scalable cloud storage for files, backups, and data.
  • RDS and DynamoDB. Fully managed databases.
  • Lambda. A serverless service that runs code automatically in response to events.
  • CloudFront. A content delivery network (CDN) that accelerates websites globally.
  • Route 53. A service for managing domain names and traffic routing.

And that’s just a slice of the ecosystem. AWS also supplies tools for messaging, data streaming, monitoring, security, and access management, everything a company needs to build and scale its products in one spot.

AWS isn’t just for startups. It powers some of the largest organizations globally, from banks and e-commerce platforms to government portals, transportation systems, and smart home devices. Even if you don’t interact directly with Amazon, it’s almost guaranteed that some of your favorite apps and services are relying on its cloud.

According to estimates, more than 1.45 million companies use AWS, accounting for about 64% of all enterprises that rely on cloud services. While there are other major players — Microsoft Azure, Google Cloud, and Alibaba Cloud — AWS remains the largest and most trusted provider for most businesses. It was the first to introduce a practical infrastructure-as-a-service model, and over time it has become the industry default.

Estimates suggest over 1.45 million companies use AWS, representing about 64% of all enterprises that depend on cloud services. While there are competitors like Microsoft Azure, Google Cloud, and Alibaba Cloud, AWS remains the largest and most trusted provider for most businesses. It’s ubiquitous. It pioneered the practical infrastructure-as-a-service model, and it cemented its position as the industry standard over time.

So when Amazon’s cloud experiences downtime, the ripple effects are felt globally. Websites crash and online payments fail. Municipal systems stop working, appliances glitch, and people lose their money, time and health. It’s horrible and it just keeps on happening.

The October 2025 AWS outage

In October 2025, the world was reminded once again how dependent the internet has become on a single cloud provider. Early in the morning on the East Coast USA, thousands of websites began to crash, from online banks and trading platforms to smart home systems. Users worldwide reported being unable to log in to apps, place orders and access their news feeds. Amazon quickly confirmed that the problem stemmed from the US-East-1 region, which powers many of AWS’s core internal services.

According to Amazon’s subsequent report, the outage was triggered by an error in the automated DNS management system, which led to a failure in DynamoDB, a distributed database critical to thousands of services. This glitch prevented servers from resolving domain names correctly, sparking a chain reaction that brought down dozens of subsystems. The BBC called this incident one of the largest outages in recent years, not only in scale but also in reach, affecting social media, banking apps, streaming platforms, smart beds and security cameras.

The outage lasted for several hours. By midday GMT, Amazon engineers had restored DNS stability and committed to reviewing their automated update mechanisms. However, the fallout extended beyond technical disruption. Many companies started reevaluating their fault-tolerance strategies and reignited discussions around multi-cloud solutions.

The largest AWS outages in history

AWS officially launched in 2006, and over nearly two decades, it has experienced several major outages, each unveiling vulnerabilities in its architecture or management of its extensive systems.

2011: First Mass outage (US-East-1)

In April 2011, the US-East-1 region (Northern Virginia) went offline due to issues with the Elastic Block Store (EBS). A replication error sparked a massive overload, knocking out platforms like Reddit, Foursquare, Quora, and even parts of Amazon.com itself.

2015: DynamoDB outage

In September 2015, failures with DynamoDB tables caused APIs and the Simple Queue Service (SQS) to crash. With many internal AWS services unable to communicate, a chain reaction took down Netflix, Reddit, IMDb, and even some of Amazon’s own sites.

2017: Amazon S3 bug

February 2017 was a landmark moment when an engineer accidentally deleted a subset of S3 servers during routine maintenance. This misstep disrupted image and file hosting for millions of websites, impacting services like Slack, Trello, and GitHub Pages. AWS later admitted that a single mistyped command led to the most significant service disruption in S3’s history.

2020: Kinesis and Cognito outage

In November 2020, the Kinesis service, responsible for processing streaming data, went down, which took the Cognito authentication system and CloudWatch monitoring with it. This outage caused ripple effects across connected platforms like Ring, iRobot, and Roku, leaving users unable to control their smart devices and developers unable to deploy updates.

2021: A whole bunch of December outages

December 2021 was marked by three major incidents in quick succession: first, network infrastructure failed, followed by issues with the Elastic Load Balancer and later Route 53. Twitch, Disney+, Coinbase, Zoom, and Alexa were all affected, and large portions of the internet slowed down temporarily.

December 2021 was marked by three major incidents in quick succession

  • a network infrastructure failure
  • issues with the Elastic Load Balancer
  • and later, Route 53 problems. 

A host of platforms, including Twitch, Disney+, Coinbase, Zoom, and Alexa, were affected, resulting in temporary slowdowns across large portions of the internet.

2023: CloudFront outage

A global outage in the CloudFront CDN disrupted content delivery across multiple countries, causing websites to load partially, images and videos to vanish, and apps to return API errors.

2025: DynamoDB DNS outage

Most recently, in October 2025, an error in the DNS system led to part of DynamoDB’s infrastructure failing, triggering yet another global outage.

Why does this keep happening?

Frequent AWS outages raise an important question: why do these incidents keep happening? Various factors contribute to the disruptions, many stemming from the complexities of operating such a massive cloud infrastructure. Understanding these issues sheds light on our growing dependence on cloud technology, guiding smarter infrastructure choices.

Why are we still facing these issues in a world where technology should be seamless? The persistent failures raise serious questions about their commitment to reliability and accountability. So, why does this keep happening, and when will Amazon take responsibility to ensure it doesn’t?

Human error

Some of the most high-profile AWS incidents have stemmed from simple human mistakes. For example, in 2017, an engineer accidentally deleted part of the S3 infrastructure during routine maintenance. In our interconnected landscape, one wrong command can take down services relied on by millions.

Complexity and scale

AWS operates hundreds of thousands of servers across numerous regions worldwide. Each new layer of automation or functionality introduces added complexity. As services multiply and interconnections deepen, even a minor malfunction can trigger a chaotic chain reaction.

Internet centralization

When companies began migrating to AWS en masse in the early 2010s, it felt revolutionary, no need to build data centers, just pay for what you use. However, over time, much of the internet has come to rely heavily on this single cloud. Today, an outage in a single AWS region can disrupt banks, transportation systems, healthcare platforms, and even IoT devices in homes.

Errors within the cloud itself

Customers aren’t always to blame. Failures can occur deep within AWS’s internal systems, such as DNS, metadata services, monitoring, routing, or load balancing. These core components control the entire ecosystem. When they fail, operations can grind to a halt, even if the servers are perfectly fine.

Balancing speed and stability

Amazon rolls out hundreds of updates daily. This rapid pace keeps AWS ahead of competitors but makes the system more prone to synchronization errors. Occasionally, the platform’s self-healing mechanisms fail to react promptly, resulting in what feels like an internet-wide outage.

Lessons the world has learned from AWS outages

As companies increasingly depend on AWS, the impacts of disruptions can ripple through entire industries. This evolution has sparked significant shifts in how businesses approach their infrastructure. From multi-cloud strategies to a renewed focus on preparedness, key lessons have emerged that reshape our understanding of cloud reliability and resilience.

The era of multi-cloud

Following the December 2021 outages, and especially this recent 2025 incident, many businesses embraced multi-cloud strategies. Companies like Netflix, Dropbox, and Spotify, along with financial institutions like JPMorgan Chase, are diversifying their infrastructure across AWS, Google Cloud, and Azure to reduce single-cloud dependency. While this approach can be more costly and complex, it significantly lowers risk: if one cloud fails, traffic can be rerouted quickly to another.

A return to on-premises solutions

Ironically, as cloud computing has become more widespread, interest in on-premises solutions has risen. Some organizations are rebuilding their own data centers or private clouds to maintain control over critical systems. Major banks, telecom operators, and government agencies increasingly rely on hybrid models, part of the infrastructure being hosted on AWS, while the rest is hosted on on-premises servers.

A new generation of DevOps practices

Each major outage drives the evolution of DevOps practices. AWS and other providers continue to implement safeguards against human error, while customers are becoming more proactive by investing in automated testing, improved rollback systems, and chaos engineering, the deliberate simulation of failures to test system resilience before real failures occur.

Cloud dependency risk

For years, the cloud was synonymous with reliability. Now, it also represents vulnerability. Security audits and SLAs increasingly address “cloud dependency risk,” and even large media and fintech companies maintain contingency plans for when AWS goes down.

Preparedness culture

The biggest takeaway is that failures are now seen as a norm rather than an unexpected event. These days, you can’t act surprised when the internet stops working. The focus has shifted from preventing outages completely to being ready with recovery plans, real-time monitoring, redundancy systems, and transparent communication with users.


Every AWS outage serves as a stark reminder of the fragility of our digital world, despite its scale and apparent boundlessness. The clouds we view as symbols of stability are susceptible to the same flaws that afflict all human creations: errors, complexity, and unpredictability.

Amazon Web Services has built the infrastructure powering the modern internet. But as this network expands, so does its interdependence. A single faulty update in Virginia can leave millions of apps, protocols and devices unresponsive on the other side of the globe. 

Acknowledging this vulnerability drives progress, enabling us to design more resilient systems, respond more swiftly, and create a world where even a cloud outage can’t bring daily life to a standstill. And that’s what we do here at SpaceLama, relentlessly focusing on improving our hyperstable hosting infrastructure.

Our hosting solutions are built on a foundation of advanced redundancy, ensuring that your services remain online even during unexpected incidents. With state-of-the-art encryption protocols and comprehensive monitoring systems, we safeguard your data against threats while providing real-time insights into your infrastructure. And our global network of data centers ensures minimal latency, allowing you to access your applications swiftly, no matter where they are. 

Plus, with our proactive support team available 24/7, you can be confident that any challenges will be addressed promptly, keeping your operations smooth and uninterrupted.