Clustering Tag

Pages (1):  1

Another kind of shinglingDetermining whether two documents are exactly the same is pretty easy, just use some suitably sized hash and look for a match. A document will generally only hash to the same as another though if they are identical - the smallest change, or the same content on another site with a different header and footer, for example, will cause the hash to be quite different. These near duplicates are very common, and being able to detect them can be useful in a whole range of situations. Shingling is one process for relatively cheaply detecting these duplicates.

Read More »

K-Means Clustering

14 Aug 2009 In:

A clustered gall wasp, apparentlyMy friend Vincenzo recently posted up a review of academic work on clustering that he compiled while working at the University of Naples. It's worth a look if you're interested in the field, going from the basic methods all the way up to the latest techniques like Support Vector Clustering (which I believe you can read about in Enzo's masters thesis).

Read More »
Pages (1):  1