Cascade failure. If you’re in IT, that’s a particularly frightening term. In the case of last week’s Seattle data center fire, the term is especially appropriate since it was literally a cascade of water that wrecked everything and sent a number of businesses and online services offline. Here’s a look at this disaster and a way it could have been prevented.
Background: Fisher Plaza, a Major Hosting Facility
Fisher Plaza is “a self-styled carrier hotel in Seattle, and home to multiple datacenter and colocation providers.” [source] A partial list of organizations hosted there includes: payment service provider Authorize.net (which itself has 238,000 merchant customers), Port of Seattle email system, Swedish Hospital’s internal IT systems, Pacific Science Center website, geocaching.com website, major TV and radio station KOMO, online Facebook game Bejeweled Blitz and dozens of other businesses [source].
The Problem: Fire Leads to Cascade Failure
Early on Friday morning, July 3, 2009, Fisher Plaza’s main generator/transfer switch failed. This caused an overload. This caused a fire. This triggered the fire suppression system and brought firefighters to the scene, both of which shot water into the generator room. The generators stopped, and we deduce that power from the grid was shut off too. The UPS and the cooling system also failed. Temperatures in the facility rose high enough to wreck some servers and destroy data [source].
Think about the downstream effects. 238,000 merchants potentially have their transactions interrupted or lost because Authorize.net’s servers are forced offline. One can only hope they had their own functioning backup plan. A hospital’s IT system became unavailable; I have no information on what impact this had on patient care. And apparently KOMO had to transmit from a mobile unit in their parking lot [source]. It is not hard to imagine the impact to these and other organizations.
The Solution: Fire- and Flood-Proof Hosting
No, ZeroNines does not wrap servers in asbestos. There is no way to know what bizarre little accident will happen next, so prevention is unlikely. Some will trigger chain reactions that become major IT disasters.
What we do is to prevent a catastrophe in one place from knocking out a business everyplace. In this case, if any of the clients or tenants at Fisher Plaza had been using our technology, their data, transactions, apps, and other assets would have all been processing simultaneously and in perfect replication in other data centers hundreds or thousands of miles away.
This is not a cutover scenario. Processing would not have “switched” from Seattle to elsewhere. It simply would have stopped in Seattle and continued in real time in San Jose, or Denver, or Singapore, or wherever else they placed their data centers. There would be no loss of business continuity. Their businesses would not have gone down, and the real disaster – lost connectivity, productivity, and revenue – would not have taken place.
Visit the ZeroNines website to find out more about how our disaster-proof architecture protects businesses of any description from downtime.
Alan Gin – Founder & CEO, ZeroNines
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment