May 24, 2010

TD AMERITRADE Outage and How Failover Fails Finance

Online brokerage TD AMERITRADE was offline for 80 minutes on Thursday May 20, 2010 [source]. Because of the outage, some of their clients could not log in to their accounts to place trades during the powerful market downdraft that occurred that day [source]. Outages among financial firms have gotten a lot of coverage in the last couple years, no doubt because of the universally amped-up sensitivity to any kind of news with the word “financial” attached to it. Here’s a brief look at this outage, and some commentary on outages in general among financial companies.

Background: About TD AMERITRADE

Online discount broker TD AMERITRADE has millions of U.S. customers (Wikipedia reports over six million), and many more internationally. The company has grown rapidly through acquisition and was the 746th-largest US firm in 2008 [source]. It acquired thinkorswim Group, Inc., another popular online brokerage, in January 2009. Lots of average Americans use TD AMERITRADE to generate income and manage retirement accounts. I use them myself and really like their system, but did not notice the outage because I was doing other things at the time.

The Problem: An Outage of Some Kind

At about 11:40 AM Eastern time, clients found that they could not log on to the TD AMERITRADE retail website. The outage ended at about 1:00 PM. No disruption was reported on their mobile site or at their subsidiary thinkorswim [source]. Clients already logged in experienced no trouble, urging one writer to speculate that it was a web authorization issue of some kind. [source]. If TD AMERITRADE has made a formal announcement of the cause, a half hour of Googling on my part failed to find it.

Was This a Failed Failover?

Posted on the TD AMERITRADE site [source] is the “TD AMERITRADE Business Continuity Plan Statement” [source]. One of the statements in this brief public document reads “Disruption of service at any of our service centers will result in calls, orders and electronic communications being re-routed to an alternative service center located in a different region of the country with a separate power grid and transportation system.”

Let me state clearly that I am entering the realm of speculation here. The statement quoted above implies that TD AMERITRADE is relying on a business continuity plan based on failover architecture. Failover or cutover has been the de-facto choice for business continuity and until recently it has been the only real game in town. But it is by nature unreliable and even the best systems are subject to downtime. If their backup plan is indeed based on failover, then failover obviously failed them.

The Cost: As Always, it’s the Intangibles

As in so many outages of this kind, the real costs are difficult to estimate. Easiest to ponder are the lost commissions from trades that could not occur during an extremely busy trading day. Less tangible are the effects on reputation and customer satisfaction. No one wants a broker that is unavailable when they need them most. One customer claimed to have lost about $2,000 from being unable to log in [source]. TD AMERITRADE stock fell about 3.7% that day but this may not mean much because markets overall were down about 3%.

According to a May 2007 article from Financial Services Technology, a study from the Meta Group revealed that “the cost per hour for downtime – ranging from simple network outages to major emergencies – in the financial services sector is, on average, $1.4 million” [source]

An Ugly Thought: Downtime among High Frequency Traders

For many, the cost will be far higher. Some banks, hedge funds, and other high-power financial firms engaged in High Frequency Trading (HFT) make billions of trades a day over ultra-high speed connections [source]. Many trades live for only a few seconds. Enormous transactions are conceived and executed in half a second, with computers evaluating the latest news and acting on it well before human traders even know what the news is. HFT is having a significant effect on markets; there is evidence that the history-making “Flash Crash” of May 6 2010 was caused and then largely corrected by High Frequency Trading [source]. What would happen if one of these HFT systems was down for an hour and a half? Or even just a minute? Whatever your stance on the ethics of HFT, I think it fair to say that those engaged in it need to avoid downtime at all costs.

Failover Can’t Handle It

Even a successful failover event may cause some glitches and lost trades among the average retail trading populace. But if a High Frequency Trading system experiences such a glitch, billions of dollars could be lost in the blink of an eye. The trades themselves may fail, and by the time the system comes back up the conditions that made those trades possible are a thing of the past. And that’s for a successful failover. A failed failover can leave businesses out of the race for minutes, hours, and even days.

The Alternative: Active/Active Architecture

High profile financial systems clearly need something better than failover. The typical outage is caused by failures of server hardware, server software, upgrades, maintenance, and sometimes more dramatic stuff like fires and floods. The best protection in these cases is to eliminate failover entirely, and switch to an “active/active” or “hot/hot” architecture that eliminates the chance of a failed cutover and the resultant downtime. Always Available™ business continuity architecture from ZeroNines is one such system. Always Available processes all network transactions continually, simultaneously, and equally in multiple locations on multiple servers, all of which are hot and all of which are active. Always Available can offer virtually 100% uptime, because instead of relying on failover Always Available simply continues running the same apps and data at two or three additional locations, with no interruption to the user. So if a web server or database goes down somewhere, the other nodes of the system continue processing without missing a beat. Visit the ZeroNines website to find out more.

Alan Gin – Founder & CEO, ZeroNines