Amazon is the world’s largest cloud services provider, and on Tuesday 28th February, some of the Amazon Web Services crashed for 5 hours straight. Amazon has now, revealed the reason for the crash and it appears that the culprit was a typo.
The company just issued an apology for the system crash, saying that the reason was a command that was entered incorrectly. To restore the systems, a full system restart was required, which took much longer than anticipated, as the Amazon Web Services have grown at a very fast pace. The command was supposed to remove a small number of systems, but the typo caused many servers to shut down.
“At 9:37 AM PST, an authorized S3 team member using an established playbook executed a command which was intended to remove a small number of servers for one of the S3 subsystems that is used by the S3 billing process,” explained Amazon. “Unfortunately, one of the inputs to the command was entered incorrectly, and a larger set of servers was removed than intended. The servers that were inadvertently removed supported two other S3 subsystems.”
The company is planning to have a change in its systems so that such a small mistake in the future will not cause a large-scale system crash in the future. Explaining the reason, Amazon apologized to its customers,
“We want to apologize for the impact this event caused for our customers. While we are proud of our long track record of availability with Amazon S3, we know how critical this service is to our customers, their applications and end users, and their businesses. We will do everything we can to learn from this event and use it to improve our availability even further.”
We would like to know your thoughts on the incident. Comment below!