Mustru is a desktop search engine written in Java using Lucene, Lingpipe, and the Berkeley DB . Create an index from a set of directories on your local filesystem and use the Web based interface to query the index. Submit questions in natural language or boolean queries using keywords.
This is the "Eclipse of Web Browsers", a secure social web browsing environment that runs off of your own highly personal and private data store. Written primarily in Java, it uses Gecko as its web runtime, and has a back-end driven by MySQL and Lucene
See also http://www.suprasphere.com/
GATE is...
* the Eclipse of Natural Language Engineering, the Lucene of Information Extraction, a leading toolkit for Text Mining
* used worldwide by thousands of scientists, companies, teachers and students
* comprised of an architecture, a free open source framework (or SDK) and graphical development environment
* used for all sorts of language processing tasks, including Information Extraction in many languages
* funded by the EPSRC, BBSRC, AHRC, the EU and commercial users
* 100% Java reference implementation of ISO TC37/SC4 and used with XCES in the ANC
* 10 years old in 2005, used in many research projects and compatible with IBM's UIMA
* based on MVC, mobile code, continuous integration, and test-driven development, with code hosted on SourceForge
Mahout's goal is to build scalable, Apache licensed machine learning libraries. Initially, we are interested in building out the ten machine learning libraries detailed in http://www.cs.stanford.edu/people/ang//papers/nips06-mapreducemulticore.pdf using Hadoop. While these algorithms are our initial focus, we welcome contributions of other machine learning approaches.