Apache Jena project! Apache Jena™ is a Java framework for building Semantic Web applications. Jena provides a collection of tools and Java libraries to help you to develop semantic web and linked-data apps, tools and servers.
Apache Stanbol (currently in incubation ) is an open source modular software stack and reusable set of components for semantic content management. Apache Stanbol components are meant to be accessed over RESTful interfaces to provide semantic services for content management. The current code is written in Java and based on the OSGi modularization framework. Applications include extending existing content management systems with (internal or external) semantic services, and creating new types of content management systems with semantics at their core.
"Apache Stanbol provides a set of reusable components for semantic content management. Apache Stanbol's intended use is to extend traditional content management systems with semantic services. Other feasible use cases include: direct usage from web applications (e.g. for tag extraction/suggestion; or text completion in search fields), 'smart' content workflows or email routing based on extracted entities, topics, etc."
"Apache Stanbol provides a set of reusable components for semantic content management. Apache Stanbol's intended use is to extend traditional content management systems with semantic services. Other feasible use cases include: direct usage from web applications (e.g. for tag extraction/suggestion; or text completion in search fields), 'smart' content workflows or email routing based on extracted entities, topics, etc."
The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution. These tasks are usually required to build more advanced text processing services. OpenNLP also includes maximum entropy and perceptron based machine learning.
"The Open Relevance Project (ORP) is a small Apache LuceneTM sub-project aimed at making materials for doing relevance testing for Information Retrieval (IR), Machine Learning and Natural Language Processing (NLP) into open source."
"The goal of Apache Marmotta is to provide an open implementation of a Linked Data Platform that can be used, extended and deployed easily by organizations who want to publish Linked Data or build custom applications on Linked Data."
"The Apache Hive ™ data warehouse software facilitates querying and managing large datasets residing in distributed storage. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL."
"Apache Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language, and is a lot of fun to use!"
CumulusRDF is an RDF store on cloud-based architectures. CumulusRDF provides a REST-based API with CRUD operations to manage RDF data. The current version uses Apache Cassandra as storage backend. A previous version is built on Google's AppEngine. CumulusRDF is licensed under GNU Affero General Public License.