A Student’s Guide to Search Engines, Algorithms and Ranking

| April 6, 2012 | 0 Comments

When a user enters a search query, it is then the job of the search engine to determine which of those pages are the most relevant to the search query and in what order they should be displayed.

To many people, search engines are synonymous with Google, so much so that using it has become a verb as in, let’s Google it!  Google is the most advanced and fastest search engine in the world, with the most complex algorithmic capabilities, enabling it to produce search results very quickly and to place them in order of relevance to the user.

Google has its own method of determining this order which it calls PageRank.

Determining PageRank involves three different processes: crawling web pages, indexing them and displaying them on a results page in relevant order.



Google’s crawler is known as ‘Googlebot’ which is simply a data-retrieval program with a limited level of functionality.  It is tasked to collect specific types of information from each web page which it takes from meta data, HTML tags, textual content and by following links.

There are some things that Googlebot is unable to see or use:
• Web pages designed with Flash, frames or tables
• Use buttons, drop-down menus or search functions
• Follow JavaScript navigation
• See images or read text placed on an image

The algorithm itself determines which pages are to be crawled and how often.  Googlebot also notes new hyperlinks and bad ones such as those that lead to dead ends or ‘bad territory’ e.g. spamming sites or link farms.



This is the lengthiest part of the process, where all of the data collected by Googlebot is assimilated in order to create the index.  This is not a simple process because it is not just the actual content that Googlebot records that is taken into account.

There are numerous other algorithmic factors which include the location of content on a web page and its credibility. Generally, websites which have a high ranking in the SERPs are considered to have credibility.  A web page that has been well optimised and maintained will be seen as credible by Google and achieve a higher ranking, based on the relevancy and freshness of its content.


Search Results

Google defines its search results by relevancy, determined by over 200 separate algorithmic factors, one of which is PageRank, which was named after its developer, Larry Page.  It is a very complex link-analysis algorithm which was developed to assign a weighting number to each part of a group of items and place them in order of importance.

Put into the context of search engine results specifically, the weighting factor is PR1 to PR10 and is given relative to both the quantity and quality of incoming links the web page has.  Nobody except Google knows the exact formula of its algorithms but PageRank results are determined by several iterations of an extremely complex mathematical equation.


Further Reading:

‘Getting Noticed on Google’ by Ben Norman


