Carrot2 is an Open Source Search Results Clustering Engine. It can automatically organize small collections of documents (search results but not only) into thematic categories.
Apart from two specialized document clustering algorithms, Carrot2 offers ready-to-use components for fetching search results from various sources including GoogleAPI, Bing API, eTools Meta Search, Lucene, SOLR, and more.
HERE'S a scene in my house: My almost 9-year-old is on the internet doing something or other, and I am not standing over her shoulder or otherwise monitoring her.
Is this negligent? Am I throwing her to the wolves? I have no idea how to approach these thorny questions, so I have lunch with the academic and Microsoft researcher, danah boyd (she spells her name in lowercase letters for complicated philosophical and aesthetic reasons), who has studied this cluster of issues in an original and challenging way.
Imagine the web as a giant galaxy where the planets are sites clustered together by likeness, and what you might get is something like The Internet Map. Representing over 350,000 websites from 196 countries and all domain zones at the end of 2011, the map displays over 2 million site links based on topical similarities. Each site is represented by a circle, with size depending on the amount of traffic, and the space between each is determined by frequency, or strength, of the link created when user's jump from one website to another.