Thanks to everyone that came along to my talk about integrating search engines at the wonderful PHP UK 2010. The slides are available over at Slideshare. It was great speaking to so many people afterward about the challenges and solutions they've found with various engines, and the conference itself was top notch - congratulations to the organisers.
Read More »
The web is a great place for people to express their opinions, on just about any subject. Even the professionally opinionated, like movie reviewers, have blogs where the public can comment and respond with what they think, and there are a number of sites that deal in nothing more than this. The ability to automatically extract people's opinions from all this raw text can be a very powerful one, and it's a well studied area - no doubt because of the commercial possibilities.
Google was a better search engine than it's predecessors for a number of reasons, but probably the most well known one is PageRank, the algorithm for measuring the importance of a page based on what links to it. Though not necessarily that useful on its own, this kind of link analysis can be very helpful as part of a general information retrieval system, or when looking at any kind of network, such as a friend graph from a social network.
For anyone interested in such matters, I'll be speaking at a couple of events in February next year which could provide valuable Learning Experiences for your continual professional development!
Read More »
After a rather technical post last week, something a bit lighter. Text and language generation is a fun topic with applications that run from randomly generating scientific papers for conferences, to the practical tasks of generating speech and automated responses. In this post we'll look at how we can generate some nonsense text based on existing documents, which isn't on the overly practical side, though it can make a fun change from Lorem Ipsum for holding copy. The code is throughout, but you can also grab the lot in a zip.
A site about search, text categorisation, clustering and other interesting topics relevant to the web, but not often covered for PHP developers.