Let’s take a simple look at a search engine. There are three pieces of software that together make up a search engine: the spider software, the
The spider software ‘crawls the web looking for new pages to collect and add to the search engine indices’.
This is a metaphor. In reality, the spider doesn’t do any ‘crawling’ and doesn’t ‘visit’ any web pages. It requests pages from a website in the same way as Microsoft Explorer, or Firefox or whichever browser you use requests pages to display on your screen.
The difference is that the spider doesn’t collect images or formatting – it is only interested in text and links AND the URL, (for example, http://www.Unique-Resource-Locator.html) from which they come. it doesn’t display anything and it gets as much information as it can is the shortest time possible.
Since the spider doesn’t collect images, it doesn’t take notice of Flash intros or colorful pictures. So, make sure your images, logo or videos are identified by a text ‘alt tag,’ or the spider will ignore them and they will not have value in the search engines.
The indexing software catches everything the spider can throw at it (yes, that’s another metaphor). The index makes sense of the mass of text, links and URLs using what is called an algorithm – a complex mathematical formula that indexes the words, the pairs of words and so on. Essentially, an algorithm analyzes the pages and links for word combinations to figure out what the web pages are all about – in other words, what topics are being covered. Then, scores are assigned that allow the search engine to measure how relevant or important the web pages (and URLs) might be to the person who is searching. While each of the major search engines (like Google, Yahoo or Bing) has their own secret algorithm for scoring, they are all using the information a spider collects.
And of course, the indexing software records all of this information and makes it available. The spider takes the information it has gathered about a web page and sends it to the indexing software where it is analyzed and stored. When someone types chocolate into the query box on a search engine page (such as Google), then it’s time for the query software to go to work.
The query software is what you see when you go to a search engine – it is the front end of what everybody thinks of as a search engine. It may look simple but the query software presents the results of all the quite remarkable spider and index software that works away invisibly on our behalf.
So, when you type in your search words and hit search, then the search engine will try to match your words with the best, most relevant web pages it can find by ‘searching the web’.
But this too is a metaphor and perhaps the most important one.
The query software doesn’t actually search the web – instead, it checks through all the records that have been created by its own index software. And those records are made possible by the text, links and URL material the spider software