Each search engine has a different method for crawling and indexing the world wide web and a host of different algorithms for ranking pages in search results.

The algorithms are commercial secrets – to prevent them being stolen by competitors or gamed by SEO specialists – so everything we ‘know’ about them is based on the collective observations of the SEO community, what the companies say publicly and the patent applications that they’ve made.

Since it still commands about 90% of all the searches taking place in the UK, and since the other major search engines work in a similar way, I’ll focus on Google.

How Google indexes web pages

Google uses automated programs (often called robots or spiders) to crawl the web and send back what they find.

These programs are like headless web browsers – they can read HTML, CSS, JavaScript etc, follow hyperlinks, record page load times and gather a host of other data. The content of the page and this other relevant information is then sent to one of Google’s many data centres where it is stored.

Provided all is in order the page is now said to be ‘indexed’ and it is ready to be called upon to provide a search result.

google-crawl

The search algorithm

When someone actually enters a search query how does Google decide which results are relevant and in which order to present them?

This is the job of the algorithm. How the algorithm works exactly is unknown but Google says that there are over 200 factors it takes into account to determine the relevancy of a particular page to a particular search query.

From Google’s public statements and the work of SEOs over the past 10 years+ we know that they fall into two broad categories: on-page factors and off-page factors.

We also know that the most important of these factors include the text content of the page, the way elements are marked-up on the page (both on-page factors) and the number and relevancy of links pointing to the page (an off-page factor).

These ‘ranking factors’ are subject to continual change and speculation but if you want to know more there’s a nifty periodic table of ranking factors over at Search Engine Land, based on the prevalent wisdom of the time.

Algorithm updates

Google attained its huge popularity by presenting users with the most relevant results to their search queries. To fend off competition from Yahoo!, Bing and other search engines – and to maximise the profitability of their search advertising – Google is continuously changing the way that search results are displayed.

According to Google’s Search Quality team they roll out 500+ algorithm changes each year. The most famous of the recent algorithm updates are:

Caffeine – a big update from June 2010 which radically increased the speed and freshness of Google’s search results

Panda – an update first released in February 2011 that was designed to tackle websites with low quality content

Penguin – first released in April 2012 this update was designed to penalise sites with spammy characteristics e.g. many low quality links pointing to it, keyword stuffed content

Hummingbird – first appearance in August 2013. Like Caffeine it fundamentally changes the way search results are delivered, probably moving Google closer to a semantic engine

Each time an update is released, and the way search results appear changes, there are winners and losers.

The impact on SEO

The practice of search engine optimisation has changed along with these algorithm updates.

Techniques that were common in years gone by – like keyword stuffing or secretly paying other websites to provide links – have been rendered ineffective by Google who increasingly demand high quality content and a natural profile of links pointing to a site.

Search engine optimisation is now much more closely aligned to content marketing or PR – well devised content and newsworthy stories are not only more likely to meet search engines desire for  quality and freshness but are also more likely to attract natural links and citations from other websites.

Leave a comment

Your email address will not be published. Required fields are marked *