strapline - putting your complex web systems to the test Home Search Contact

 

 

  Home > Reference > Articles

How to use headers to maximise caching

 

 

 
 

Caching both increases the speed of your site and reduces the required bandwidth. But the full benefits from caching are not always realised. In particular one area that is often overlooked by many web designers, or perhaps not fully understood, is the importance of cache headers.

Visitors to your site will have a cache either on their own machine, as part of their web browser software, or at some location on route such as their ISP (proxy cache). By caching elements of your website during one visit, the visitor effectively reduces future download times. To ensure that these elements are fresh, the browser cache checks the HTTP header information provided by the web server with each element. The result is that the user perceives a faster delivery (less latency). Also less content is required from the website, reducing the overall traffic it has to deal with.

So what should you cache, and primarily what should you consider regarding headers?

What to cache? It makes sense to start with those elements that are both large and popular. These might be images on your home page or bulky pdf documents that are intended to remain unchanged, such as annual reports or datasheets.

Next look at other page elements such as .html pages, .js javascript files and .css style sheets. These are often 20K to 40K - ideal for caching on popular pages.

Header fields? There are a number that caches will take note of. But the most widely-used is the 'Expires' field. This simply gives a date after which files should be refreshed. In the absence of any specific cache information in the header, some browsers will look for the 'Last-modified' header field.

Depending on the specific browser a decision will be made on the 'freshness' of an element based on how far past the 'Last-modified' entry the current date is. Hence it is important not to re-publish unchanged files on your website for example when uploaded by FTP. This will avoid a newer Last-modified date being written to the HTTP header during the publishing process.

With HTTP/1.1, in addition to the above headers, a new cache-control class was added. These allow web publishers to define how pages should be handled by caches, what may be stored, revalidation and reload controls and expiry modifications; the MaxAge cache-control field defined in seconds, for example, optimises caching for elements updated at regular time intervals.

Freshness and validation are most important. If an element is stale, your webserver will be asked to validate it, or tell the cache whether the copy that it has is still good. A fresh object will be available instantly from the cache, while a validated object will avoid transferring the entire object again if it hasn't changed.