Three That Are One
Crawler-based search engines are made up of three major elements: the spider, the index, and the software. Each has its own function and together they produce what we have come to trust (or distrust) on the SERPs (Search Engine Results Pages).
The Hungry Spider
Also known as a web crawler or robot, a search engine spider is an automated program that reads web pages and follows any links to other pages within the site. This is often referred to as a site being "spidered" or "crawled". There are three very hungry and active spiders on the Net. Their names are Googlebot (Google), Slurp (Yahoo!) and MSNBot (MSN Search).
Spiders start their journeys with a list of page URLs that have previously been added to their index (database). As it visits these pages, crawling the code and copy, it adds new pages (links) that it finds on the page to its index. As such, one could refer to a spider as feeding an evolving index, which is discussed below.
The spider returns to the sites in its index on a regular basis, scanning for any changes. How often the spider returns is up to the search engines to decide. Website owners do have some control in how often a spider visits their site by making use of a robot.txt file. Search engines first look for this file before crawling a page further.
The Growing Index
An index is like a giant catalogue or inventory of websites containing a copy of every web page and file that the spider finds. If a web page changes, this catalogue is updated with the new information. To give you an idea of the size of these indexes, the latest figure released by Google is 8 billion pages.
It sometimes takes a while for new pages or changes that the spider finds to be added to its index. Thus, a web page may have been "spidered" but not yet "indexed." Until a page is indexed - added to the index - spidered pages will not be available to those searching with the search engine.
Search engine software is the third part of a search engine. This is the program that sifts through the millions of pages recorded in the index to find matches to a search and rank them in order of what it believes is most relevant. You can learn more about how search engine software ranks web pages on the aptly-named How Search Engines Rank Web Pages page.
Sources : http://searchenginewatch.com/article/2065173/How-Search-Engines-Work
Five example of search engines on internet
1. Newscred for credible news stories
2. Icerocket has RSS feed options and is a good alternative
3. Hotbot is a blast from the past and that's about all
4. Librarians' Internet Index is a brilliant resource
5. Tinker for real time serach didn't impress me in the slightest