“How can it be that a single piece of network hardware brings down a business critical part of a network? It used to be that all financial institutions ensured that they remained available at all times regardless of cost. Is it possible that the current crop of engineers don’t make this a must have feature of their designs?” (source)
These tough questions were posted by a user on the PayPal™ blog after a network equipment failure took down the popular payment service for a full hour worldwide on August 3, 2009 (source). Today I’ll take a quick look at this outage and offer a solution for preventing similar disasters.
Background: PayPal and its Impact
PayPal is one of those breakthrough solutions that has enabled e-commerce to take off as it has. Its low fraud rate and ease of use may make it the prototype for apps that could replace credit card accounts as the preferred means of making online payments. By way of illustrating PayPal’s importance, here are some stats from the company’s media site:
- PayPal's net Total Payment Volume for 2008, the total value of transactions, was $60 billion.
- PayPal has 73 million active registered accounts (184 million total accounts).
- PayPal supports payments in 19 currencies.
- PayPal's revenues now represent 32% of eBay Inc. companywide revenues.
The Problem: Network Equipment Failure
Acording to an August 3 announcement by PayPal’s SVP Technology…
“At around 10:30 am PT Monday, a network hardware failure resulted in a service interruption for all PayPal users worldwide. Everyone in our organization focused immediately on identifying the issue and getting PayPal up and running again. We accomplished that in about an hour. By approximately 3 pm PT, full service was restored across our platform." (source)
At the rate PayPal transacts, one hour of downtime means about $7,000,000 in lost or delayed transactions (source). Some comments from that and other blogs nicely illustrate the downstream effects:
"We have been down for the better part of the day. We are still down. I am a very unhappy customer. This failure has cost me thousands of dollars." (source)
"I am beginning to question my use of the product considering there does not appear to be a high availability solution in place." (source)
The Solution: Remove the Single Point of Failure
ZeroNines’ Always Available™ technology is the high availability solution this PayPal customer was wishing for. It could have prevented last week’s downtime event because it processes network transactions synchronously on multiple data centers, clouds, or virtualized environments through multiple network paths. There is no hierarchy and no single point of failure. In case of an equipment failure, power outage, application crash, storm, or other catastrophe in one area, processing would simply continue via other network nodes, switches, and data centers. The business disaster does not occur because users never lose access to the apps, data, and services they need.
Always Available would also have enabled automatic update of the errant server once it was brought back online, preventing PayPal from having to use “everyone in their organization” just go get things going again. Of course the IT department still needs to replace the failed equipment, but that can be done in isolation as the rest of the business carries on as usual.
Visit the ZeroNines website to find out more about how our disaster-proof architecture protects businesses of any description from downtime.
Alan Gin – Founder & CEO, ZeroNines
 

 
 Posts
Posts
 
 
No comments:
Post a Comment