"Federated Knowledge Extraction Framework
FOX is a framework that integrates the Linked Data Cloud and makes use of the diversity of NLP algorithms to extract RDF triples of high accuracy out of NL. In its current version, it integrates and merges the results of Named Entity Recognition tools. Keyword Extraction and Relation Extraction tools will be merged soon."
Federated knOwledge eXtraction Framework
FOX is a framework that integrates the Linked Data Cloud and makes uses of the diversity of NLP algorithms to extract RDF triples of high accuracy out of NL. In its current version, it integrates and merges the results of Named Entity Recognition, Keyword Extraction and Relation Extraction tools.
Extracting Structured Data from the Common Web Crawl
More and more websites have started to embed structured data describing products, people, organizations, places, events into their HTML pages. The Web Data Commons project extracts this data from several billion web pages and provides the extracted data for download. Web Data Commons thus enables you to use the data without needing to crawl the Web yourself.
Extracting Structured Data from the Common Web Crawl
More and more websites have started to embed structured data describing products, people, organizations, places, events into their HTML pages. The Web Data Commons project extracts this data from several billion web pages and provides the extracted data for download.
wiki contains the infobox-to-ontology and the table-to-ontology mappings which are used by the DBpedia extraction framework as well as the ontology definition itself. The framework collects the templates defined in this Wiki and extracts the Wikipedia content according to them (As of March 2010, only the dump extraction uses the mappings. DBpedia Live is going to follow shortly).
AlchemyAPI provides content owners and web developers with a rich suite of content analysis and meta-data annotation tools.
Expose the semantic richness hidden in any content, using named entity extraction, keyword extraction, sentiment analysis, document categorization, concept tagging, language detection, and structured content scraping. Use AlchemyAPI to enhance your website, blog, content management system, or semantic web application.
"Wandora is a general purpose information extraction, management and publishing application based on Topic Maps and Java. Topic maps are hypergraphs with an emphasis on subjects. Wandora stores information in layered and auto-merging topic maps. Wandora has graphical user interface, multiple visualization models, huge collection of information extraction, import and export options, embedded HTTP server with several output modules and open plug-in architecture. Wandora is a FOSS application with GNU GPL license. Wan"
"The Sentikator is a computer linguistic engine designed to calculably recognize, analyze and quantify emotions and content in texts. It allows extracting sentiment out of news, analyst recommendations, social media data, transcripts, press releases, broker news, factsheets, weather forecasts and many other sources. Sentiment extraction is highly reliable and disseminated data is preprocessed so that implementation into existing or new financial applications is easily possible. The Sentikator gives valuable insights to emotions"
"Apache Stanbol provides a set of reusable components for semantic content management. Apache Stanbol's intended use is to extend traditional content management systems with semantic services. Other feasible use cases include: direct usage from web applications (e.g. for tag extraction/suggestion; or text completion in search fields), 'smart' content workflows or email routing based on extracted entities, topics, etc."
Extract and rank concepts, tags and categories from webpages, URLs and text. Determine the sentiment expressed on a webpage. As a demonstration of Wingify's contextual targeting technology, contextually similar links to the input are also fetched from the web. We expose an API for this technology, contact us to begin using it.
Wandora is a general purpose information extraction, management, and publishing application based on Topic Maps and Java Swing. Wandora has graphical user interface, layered presentation of knowledge, several data storage options, huge collection of data extraction, import and export options, embedded server, and open plug-in architecture. Wandora is a FOSS application with GNU GPL license.
The Internet is in constant flux - with pages being added and deleted (mostly added). The Internet is also becoming more interactive - with music and video offerings (producing and consuming) brought about by broadband technologies - both wired and wireless. As the Internet evolves - the vast store of information which it contains will become more and more intelligent - as the computation forces within the Internet become better and better at extracting meaning for human use. This extraction of useful information in new and unique ways is the basis of the Semantic Internet.
Media Cloud performs five basic functions -- media definition, crawling, text extraction, word vectoring, and analysis. First, we define the set of media sources we want to collect and discover the feeds for each media source (which in the case of many newspapers includes hundreds of feeds). Second, we crawl each of those feeds several times each day to discover any new stories published by each feed and then download the html of each new story. Third, we extract just the substantive content of each story from each html page, leaving behind the ads, navigation, and other cruft. Fourth, we break that substantive text down into a set word counts so that we can count, down to the level of individual sentences, which words which media sources are using to talk about which topics. And finally, we have a set of tools for analyzing those word counts, including the Media Dashboard tool that acts as the front page for http://mediacloud.org.
"Apache Stanbol provides a set of reusable components for semantic content management. Apache Stanbol's intended use is to extend traditional content management systems with semantic services. Other feasible use cases include: direct usage from web applications (e.g. for tag extraction/suggestion; or text completion in search fields), 'smart' content workflows or email routing based on extracted entities, topics, etc."
"Extract Meaning from your Text.
The TextRazor API helps you extract and understand the Who, What, Why and How from your legal documents with unprecedented accuracy and speed."
If you own a business, you need to monitor your competitors' move so as to remain ahead of the game. However, you need to do a market research so as to gather useful information that will help you determine your position in the online business.
Anything To Triples (any23) is a library, a web service and a command line tool that extracts structured data in RDF format from a variety of Web documents. Currently it supports the following input formats: RDF/XML, Turtle, Notation 3, RDFa.
Microformats: Adr, Geo, hCalendar, hCard, hListing, hResume, hReview, License, XFN and Species