Using the Cloud – Big Name outages hint at Cloud Cuckoo Land

Date: 2nd July 2010
Author: Deri Jones

We’re seeing a growing number of clients using Cloud technologies, whether public or private, to support their delivery online. And whilst web monitoring and testing user experience is a bigger challenge on the cloud – it is also even more important to do so – due to the wider range of technical problems that can occur, and because of the greater difficulty in mapping actual user experience from data from traditional server monitoring.

Just look at the Hall of Shame of big names caught out on line with serious outages recently:

  • Inuit’s online accounting sites QuickBooks, Quicken, and TurboTax were out for hours:
  • NetSuite – leaving companies worldwide who run their business from their CRM portal offline for 30 minutes and slow for some time after that
  • Sage USA was unstable for around 24 hours at the start of June

I’m not sure the customers out there would agree with Steve Jones of Explore Consulting (a NetSuite advocate), that “occasional minor outages are just part of the reality of cloud computing today.”

Whilst there has been increased discussion of SLAs to use with Cloud suppliers, the best SLA is only paper if it is not backed up with a meaningful monitoring program.

We’re finding an increasing number of London clients and wider who are using our in-depth User Journey monitoring services, are utilising the service to firstly, ensure their money-making routes through their site are performing, and secondly to build these metrics into SLA terms.

Whilst monitoring the cloud is not trivial for a site confidence in your cloud suppliers will grow if you have hard evidence of the impact on user experience 24/7 of all your technology including public cloud monitoring or private cloud or conversely, tricky negotiations with under-performing cloud suppliers will be based on hard evidence, not just anecdotal stories of problems.  Building the right highly-dynamic Journeys for your site is therefore vital – to have that hard evidence.

Sage VP Paul Johnson wrote:

“We have been experiencing instability and outages with some of our online systems over the past week”.

Ouch. However, it’s highly likely that Sage users had experienced much shorter, sporadic stability problems in the months up to June – with better monitoring maybe the Sage cloud team would have spotted those early warning signs and been able to get remedials in place before it got out of hand.

Web monitoring is not rocket science, it’s about thinking through how your clients use your site and the multi-page User Journeys that they follow: designing in the dynamic selections, and then building a 24/7 framework that can exercise those Journeys.

There can be difficult subtleties, such as when a Journey that is intended to finish with an ‘Add to Basket’ step finds at a later page this product is out of stock and cannot be added to the basket…. What should the User Journeys script do?  Maybe start again from scratch, or maybe go back to the search page and choose a different product offered?  How many times should it try?

With the right web monitoring tools or supplier, all those options are available depending on the design of the company website. However, with the wrong tools, there is no way to handle such real world cases.

Thinking through such subtleties of a meaningful user journey monitoring programme may take a little time: but will pay dividends when you have the web monitoring SLA on the table to negotiate with your internal tech teams and various cloud suppliers. And an annual website load test to check capacity handling would not go amiss in the budget.