Date: 14th January 2011
An interesting mention this week on the BBC Technology website of the fact that users of mobile phones running Windows 7, have found their monthly Internet allowance used up – without them visiting web sites.
The Register in their inimitable way titled it: Windows 7 Phone glitch spews phantom data.
The same kind of root cause, was behind a website performance problem we encountered this month.
This retailer had been running a bunch of web site monitoring for some time. But their site had been getting subjectively slower and slower since about November – according to reports from the Call Centre, and from hands on experience too.
Because there was no sign of the symptoms on the existing User Journeys being monitored on their site, confidence was initially not dented by the subjective comments.
But finally they resolved to do a deeper investigation, and our team got involved. The first step was to review what Journeys were being monitored: and straight away it was recognised that functionality had been added to the site since the Journeys were last reviewed – and so there were key activities not being monitored.
We scripted up those Journeys – fairly typical website retailing user journeys really, nothing too out of the ordinary – choosing products at random after a choice of categories at random – and handling the occasional out-of-stock cases without throwing false alarms.
Straight away, we found that the graphs on the new Journeys were worse than the older ones. The slow down each day versus the fast performance overnight, was much more noticeable.
And drilling into the Journeys step by step, component by component, the root cause became apparent.
The retailer had been actively working on the look and feel, over time; and in order to get the look they wanted, had ended up with a lot of unique CSS style sheets, hundreds! Had taken some design time to get right, but not a wrong concept as such; each page’s design only required a couple.
The error had been made in the handling of the style-sheets.
Firstly, because there had been so many generated and evolved over the last 3 months, the designers had sensibly used a kind of file-naming convention to keep track of the versions of each.
But they’d been a little untidy, and left in references to old versions, even after they’d moved to the newer ones.
So the problem pages on the Journey would try to pull down a bunch of CSS files, none of which existed anymore.
Even that was not so bad, the site just threw 404’s for them all. But the standard 404 page for the site was being returned multiple times per page – once per missing css – and that standard 404 page was a whopping 80Kbytes. So there was around 500Kb of baggage on pages of that type.
The browser has to process that lot each time: and because it’s style-sheet it wanted, it was holding up the page rendering until the whole lot was done.
Even that is no the end of the world, some sites do have 500Kb style sheets by design. Not so bad if the browser gets it just once in a session.
But remember the idea of having so many unique styles? That meant, that nearly every time you went to that kind of page, you got another 500Kb of 404s.
But there was even one more factor that made the impact as bad as it had become, as the number of phantom CSS files being called had grown gradually: and that was that the web designers in order to streamline and simplify their task of updating styling information constantly: had not put them in a static file directory each time: but had put them into a dynamically generated part of the site.
That had overcome the nuisance factor commonly experienced by CSS designers, that you make a style change, and it doesn’t show up for those users who have already got that style sheet in a previous version cached in their browser.
So the result was that the poor web farm server hardware itself was doing more work per style-sheet than otherwise.
Combined – that meant a little bit of extra CPU, little bit of extra download bandwidth, and a little bit of extra browser render delay.
Put it into a one line keynote – unawares to them, this retailer’s technology was spewing phantom content at users, and killing user experience in a gradual spiral of death.
The fix of course was conceptually simple, and actually only took a few days to roll out.
So the motto is, if you don’t have you user journeys doing the real multi-page routes that your visitors are doing every day, it’s easy to be lulled into a false sense of security about your site, confidence that it’s performing well, when it really isn’t.
Next step for this team, we’ll help them stand back and do a website load test in a couple of months time: just to compare user experience with the clever bespoked CSS enabled, and disabled: so that they can have some hard metrics as to the cost in speed of delivery of that super-flexible styling.