One of the issues with the boolean search model is that results are unranked - every matching document for a query contains all of the terms in that query, and there's no real way of saying which are 'better'. However, if we could weight the terms in a document based on how representative they were of the document as a whole, we could order our results by the ones that were the best match for the query. This is the idea that forms the basis for the vector space model.
In an earlier post we looked at a simple search system that could handle straightforward boolean combinations of words in a query. Much of the time we can treat even 'natural' searches like that, assuming that a search like php information retrieval is "look for any document containing the words php AND information AND retrieval", but sometimes the user is searching for that specific phrase in that specific order.
If you asked most people how a search engine worked, their answer would likely be a far cry from the acres of servers and vast collections that Google queries millions of times a day. That said, the intuitive view of a search engine is in many ways just a series of incremental steps away from Mountain View.
A site about search, text categorisation, clustering and other interesting topics relevant to the web, but not often covered for PHP developers.