|
How do major websites balance the need to reduce server load to give
a faster delivery, while presenting lots of interesting content?
Below, we consider how three popular UK sites - Google, the BBC and the
Inland Revenue (did we really say 'popular'?) - achieve this and the trade-offs
they make.
Content
Google is one of the fastest sites out there. Visually it's very uncluttered,
having only 3 page elements. The majority of packets contain images, only
12 KBytes (17 packets) are required to deliver the homepage (to a broadband
user). Given that most of Google's remaining pages are dynamically generated
for specific search criteria, there is probably some clever proprietary
caching going on at the database level to keeep it so fast.
In contrast the BBC and Inland Revenue have much more content. Yet they
are still very responsive.
The BBC homepage is 93 KBytes (205 packets) and has 35 page elements.
Interestingly, the site does not use persistent connections. This results
in more traffic being generated, as a handshake is necessary for every
object. There are three additional packets per object, hence the high
packet level. The BBC could reduce the HTML page of 36 Kbytes by almost
10% merely by removing white space.
With typical homepage sizes somewhere between 80 KBytes and 130 KBytes
the Inland Revenue homepage at 102 KBytes (130 packets) is about average.
It has 39 page elements and takes 20 seconds for a modem user to download.
Eight of those elements are delivered from other central government servers
to make up the "HM Government" top frame.
Compression
Often the number of packets has nearly as much effect on delivery speed
as the total data payload - due to packet header sizing and the latency
required to send a packet and wait for an acknowledgement.
Google squeeze their raw HTML into just 1350 Bytes, achieved by delivering
the HTML compressed; the page is actually 3730 Bytes. Looking at a unique
page returning search results, Google continues to compress the HTML -
with an overall saving of nearly 50%. This efficient use of compression
is unexpected given that the majority of Google's pages are dynamically
generated, which potentially generates a large CPU load.
Neither the BBC nor the Inland Revenue use compression. For the BBC this
would have significant benefits for the 36KBytes of HTML. Compression
would benefit the Inland Revenue site as it contains much static content
which is very easy to compress. Even compression on the fly would in principle
be easier to perform as the site is not as busy as the others.
Caching
Google makes sensible use of caching. All images have HTTP headers with
the 'Expires:' field set to a date far into the future - 2038. As a result,
once an image has been downloaded initially, the browser can be confident
that it is still fresh for a long time. As Google's homepage is only 2
packets of text - with images in a local cache - the saving is 90% of
the homepage.
Due to a high degree of news elements, the BBC site is largely dynamically
driven and this makes caching more difficult. The Inland Revenue does
not use caching however as the majority of images on the page are likely
to be unchanged for months, the site would benefit from caching.
Style Sheets
Google does not use cascading style sheets (used to define page layout
separately from the page content). Given the simplicity of the site, the
improved performance generally achieved by the use of style sheets may
not be great, but might be considered for the future.
The BBC and the Inland Revenue do implement style sheets. The BBC, however,
still has some tags embedded in the HTML that should be in the style sheet.
This is perhaps historic, from the days when style sheets were not so
well supported. Moving these tags into the style sheet can only help the
overall presentation and will also reduce the page size. As an example,
<font size="1"> occurs 105 times. This could be defined
more efficiently in the style sheet saving a further 1.5 KBytes.
Favicon
Google provides a Favicon (.ico image) of 1406 Bytes of data (2 packets).
Neither the BBC nor the Inland Revenue offer a favicon.ico. When the favicon
is requested by the browser, an error page is sent which is never seen
by the user. For the BBC this wasted error page is nearly 3 KBytes of
data (3 packets) and for the Inland Revenue 6700 Bytes (5 packets).
GIFs as text headings
Although the Inland Revenue site looks to be largely text with only a
few images, the welcome heading and over 10 of the sub-headings are in
fact images. Neither of the other two sites use images for pure text headings.
To reduce the number of elements in the page and packets used, the Inland
Revenue could replace these with text, coloured or formatted appropriately.
This would also make the page more accessible for visually impaired users.
Web Farms
While not obvious from the outside, there are likely to be many web servers
in the BBC web farm. The BBC site explicitly calls up a second URL for
some of the images in the page: newsimg.bbc.co.uk. This may help to take
some load off the main servers, and perhaps is used to allow the large
images to be served from satellite servers, in closer proximity to the
user. Google uses a similar initiative, implementing 'content distribution'
from akamai.net for their UK site.
Frames
Frames can make a site less accessible and more difficult to use. Moreover,
they are not search engine friendly. Neither Google nor the BBC use frames.
The Inland Revenue does, but only to insert the obligatory "HM Government"
header.
The Inland Revenue site has 1500 Bytes of additional metadata. As mentioned
on page 3, this is to comply with the requirement for metadata on government
websites - an extra payload the other sites don't have to contend with.
The above website comparison has been carried out independently of
the sites discussed and is intended simply as a vehicle to explore the
implementation of various web technologies. We welcome any feedback from
the websites involved.
|