June 30, 2009

Uptime and the Cloud Crowd at CSIA

A few days ago, Jake Smith of Intel and I presented at the Colorado Software Industry Association (CSIA) monthly meeting in Denver (source). We talked about cloud computing and the elements that will determine its rate of adoption: the needs of businesses, their expectations of cloud performance, and the real-world limitations of the cloud that are currently stalling its adoption. The biggest issue is reliability, and I introduced ZeroNines’ technology as a potential solution. It was a great crowd, and their hunger for a reliable cloud was obvious.

Businesses need their applications and data to be available all the time. So far, clouds and cloud providers have not succeeded in proving that they can actually offer that. The industry needs to overcome the cloud’s downtime problems before serious business can be done on it. I believe the Big Three (Amazon, Azure, and Google) will refocus their efforts on providing highly available cloud infrastructures and market this capability accordingly.

The Cause is Academic

Of course every network is subject to threats and failures that can cause downtime, and there’s no getting away from that. It doesn’t take an earthquake to knock vital networked apps offline; some recent high-profile cloud provider outages have shown that all it takes is a failed OS upgrade. New and unexpected problems crop up every day. But the cause of an outage is really only academic for the business relying on the cloud. Service should simply continue because the business needs it to.

The scary thing is that the current disaster recovery paradigm (failover) is insufficient for protecting businesses when these things happen, and can’t be relied upon to prevent downtime or even a speedy recovery. In addition, there is an increase in catastrophic risk from poorly architected virtualized environments, and most notably in server consolidation, which is a core technology of the cloud.

The Solution is Continuity

At the CSIA meeting, we introduced the crowd to our Always Available™ technology, which maintains cloud continuity by synchronizing and protecting multiple private, public or hybrid clouds. It can mix cloud computing and physical hosting via datacenters hundreds or thousands of miles apart. The distance prevents any single regional disaster from damaging more than one data center. There is no server hierarchy, so all transactions run simultaneously and equally on all cloud and server nodes. Best of all, they update each other constantly in real time so if one goes down the others simply continue processing with no interruption to service.

To protect against an outage during an upgrade, I would postulate the following solution: Isolate one cloud or network node in an Always Available configuration and do your upgrade there, while the other nodes manage the clients’ transactions. Test the upgrade and slowly roll it out to the other nodes. If things start to go haywire, isolate the misbehaving node, solve your problems, and start the rollout again. There would be no need to risk the entire service on an untested upgrade.

Always Available works for cloud customers as well as service providers. It is provider- and platform-agnostic, so you can mix and match all you need to.

Visit the ZeroNines website to find out more about how our disaster-proof architecture protects businesses of any description from downtime.

Alan Gin – Founder & CEO, ZeroNines