strapline - putting your complex web systems to the test Home Search Contact

 

 

  Home > Reference > Articles

SEO - how to use spiders to maximise page indexing

 

 

 
 

Is there one thing that web designers can do to help their Marketing Department with Search Engine Optimisation (SEO)? Key phrases, perhaps?

No, the first and most important job to make your site well-listed by Google et al is to make it easy to index. Search Engine robots or spiders have a lot of work to do - the Internet is a big place, and they need to crunch pages as quickly as they can. So the Search Engine firms have designed 'coded for speed', stripped-down spiders, which are essentially very simple. They're not as clever as the web browser software we use.

Traditionally, they're not clever enough to handle technologies such as framed sites; (perhaps one reason why that technology is being used less and less). The spiders also don't like complexity, so mostly ignore Java and Javascript instructions. Any content that can only be reached via scripts is very possibly not being picked up by the spiders.

Spiders probably won't bother to look at the whole content of your page if the HTML is large, or starts with tens of kilobytes of non-content such as Javascript menus. And the spiders don’t look much in your Style Sheets (which are of course intended to hold format not content information, so spiders ignoring them is OK – unless your site happens to call images or content via links in the style sheet.)

A particularly problematic technology for indexing is the use of Flash. Spiders just don't dig into the Flash to find the content there, so unless you have the same content linked elsewhere in plain HTML, then that content is invisible. For example a large organisation has a site with around 1000 pages in Flash, and only 13 are visible from Google.


Flash also breaks a key concept of the web - that one URL links to one piece of content. A Flash movie that has perhaps whole sections of your content embedded, has just the single URL. A user can't bookmark just one page within the Flash. They can’t email a colleague to a specific part of the Flash content; and the search engines, even if they did grab the contents, would show it as a single URL page.

Some spiders will limit the time they spend in one site - so if you have thousands of products for sale on your site, if it takes too long to process each page, the spiders will give up before they find them all.

Finally, the way that you name pages or link to them can have an effect – if your site allows the same content to be called by different URLs; then the spiders see them as separate pages, and will waste time covering them both. Or worse, some ecommerce shopping systems use complex URL structures - there is no unique map between a URL and a certain page’s content; for example those that embed session id or user id into the URL. Every time the robot visits the same page, it sees it with a different URL!

STEP 1
Strip out as much of the page layout content/javascript as you can, and put them in separate .CSS and .JS files.
Help the spider find the unique page content: keep the content as near the top of the HTML page body as possible – any generic menus on the left or right column should be later in the raw HTML than the page content.
Don't be like the High Street retailer, who has a whopping 60k of scripting at the top of every single page just to drive a clever menu system - and who finds it hard to get all their thousands of product pages indexed by Google.

STEP 2
Ensure that key sections of your website have easy to find URls: www.shop.com/products/shoes
is better than
www.shop.com/products.asp?category_id=72&sessionID=98ty54df72ms


If you’re stuck with some of the shopping systems that are infamously bad for URL naming then it's vital to include extra pages that act as your sites’ Table of Contents - rather like a SiteMap, but with many pages, perhaps one per product grouping, with a mini paragraph for each product matched to a link to the actual product page; or ultimately a mini-page per product. If your content management system is unable to add such a table, there are more expensive 3rd party routes.

STEP 3
Ensure that where you’ve javascript in the site, that the links and functions still work without it. Like putting the page content higher in the raw HTML as above, this is something you’d also be doing to make you site more Accessible, to comply with the Disability Discrimination Act; so the changes you make there will also help with indexing.

NOW, CHECK YOUR PROGRESS
How do you know how many of your pages Google et al have successfully indexed?
In Google, type <>site:www.yourcompany.com where <> is a space. This will show how many pages Google has found at your site (it's not always a reliable number, so check over a period of time).

With your pages well indexed, only now is it time to try to get up the Top 10 listing in the search engines using the key phrases you want to target.