January 12, 2009

Hurricane Charley Couldn’t Stop the Email

In this first Disaster Litany Posting, I look at a sequence of events that is near and dear to ZeroNines. Our own real-world experience with a hurricane, power outages, and an email system will show just how our downtime-preventing technology works, and serve as a pattern for the solutions we suggest for other disasters.

Background: MyFailSafe™ Email System

ZeroNines offers Always Available technology that can virtually eliminate downtime among networked applications, data, and other assets. To test our technology, we created the MyFailSafe Email Service and launched it on our Always Available network in July of 2004. This was specifically intended to test Always Available in the real world, by running MyFailSafe just like any other email service is run, with real customers and real traffic, and subject to the same threats that any other network or email system is vulnerable to.

The Problem: A Hurricane

All readers who remember Hurricane Charley please raise your hands… For those of you who don’t, Charley hit Florida on August 13, 2004. According to Wikipedia, it killed about thirty people and caused $15 billion in damage. Widespread flooding, wind damage, power outages, and other problems crippled much of the state for several days. I don’t have statistics on downtime among private business networks or service providers, but it’s a safe bet that it was serious.

Charley hit about a month after we launched MyFailSafe. It caused electrical grid fluctuations that drained the Orlando local exchange carrier battery backup systems, isolating the Orlando node of the ZeroNines Always Available infrastructure. Our own battery system prevailed and still had a 75% charge when commercial power was reliably restored, but the site could not communicate for 16 hours because of LEC downtime.

The Solution: Hurricane-Proof Architecture

During this 16 hours, when our Orlando node was effectively offline, the MyFailSafe email service did not experience any downtime at all. Any user whose power was still on and whose desk was not under water experienced true 100% uptime throughout, whether they were in Florida, Colorado, Canada, Asia, or anywhere else.

How? Our Always Available deployment has additional nodes and data centers in Colorado and California. All applications, transactions, data exchanges, and other network activities run equally and simultaneously on these multiple secure application servers, geographically separated by hundreds of miles. In IT parlance, all servers are hot, and all instances of all applications are active. There is no server hierarchy, and consequently no single point of failure. When the Orlando node fell silent, all MyFailSafe processing continued uninterrupted on the others. There was no need for failover or recovery because these other nodes were far from the storm, they never went down, and continuity was maintained.

Since activation on July 15, 2004, the MyFailSafe network has never experienced any downtime for any reason, including this and other hurricanes, two migrations from server collocation providers to clouds, a data center move, and an email worm attack that interrupted email service from AOL and other major providers. These potential disasters, which forced our servers offline, had no power to bring our applications down. All applications and information retained 100% availability throughout.

Contact ZeroNines to find out more about how our disaster-proof architecture protects businesses of any description from downtime.

Alan Gin – Founder & CEO, ZeroNines

January 3, 2009

A Litany of Disasters: Downtime Events and How to Avoid Them

“Aviation in itself is not inherently dangerous. But to an even greater degree than the sea, it is terribly unforgiving of any carelessness, incapacity, or neglect."
-- Anonymous

Years ago, I saw those words on a poster of a World War One aircraft stuck about 20 feet off the ground in the limbs of a tree. If we were to update this and adapt it to the business user’s desktop, it would lose its poetic charm but strike home with a whole new audience:

“Networked assets in themselves are not inherently dangerous. But to an even greater degree than stuff on your hard drive, they are terribly unforgiving of any carelessness, incapacity, or neglect."

The warning is clear: Disaster may be only inches away, particularly for the unprepared. It’s a lot harder to recover after some accident knocks out a hundred or a thousand users than it is to re-boot your own machine.

In this blog, we will be looking at some actual disasters that have struck organizations when their networks have taken a hit from storms, fires, attacks, and far more mundane threats like human error and equipment failure.

For a business, there may be little correlation between the physical effects of a disaster and its financial impact. Imagine a business dependent upon a distant data center in the U.S. Tornado Belt. One good storm could leave their personnel and property untouched, yet destroy their ability to do business by wiping out their data, applications, and transactions. Elsewhere, an earthquake could cause deplorable loss of life and property damage, yet leave a business relatively unharmed if its networked computing capabilities remain intact. And an otherwise strong corporation could suffer irreparable damage by something as quiet as a software failure or equipment malfunction, which to the outside world does not qualify as a “disaster” at all.

I’ll be describing some instances where ZeroNines’ solutions for networks, virtualized environments, and clouds could have prevented disastrous downtime, and helped avoid unwanted headlines and losses to productivity, reputation, and revenue. Our approach does not use any kind of failover or cutover, since those occur after the downtime event and are not true disaster prevention. After all, it’s far better to avoid the downtime in the first place than to try to recover from it afterward.

Next week: How MyFailSafe really did provide fail-safe email during Hurricane Charley.

Contact ZeroNines to find out more about how our disaster-proof architecture protects businesses of any description from downtime.

Alan Gin – ZeroNines, Founder & CEO