What Does Calais Do?
10 Ways to Use OpenCalais Today | OpenCalais - 0 views
-
-
It analyzes text you send it and extracts entities (people, organizations, geographies, etc.). In many cases, it links those entities to the world of Linked Data. It extracts facts – like the fact that John Doe is the CEO of Acme Corporation or such. It extracts events – like mergers, earning announcements, natural disasters and a bunch of others. It attaches a topic to the text as a whole, much like a newspaper would (Sports, Finance, Health, etc.). It creates SocialTags – our attempt to “tag” the article a way a human would to file it away somewhere.
-
it’s free for up to 50,000 submissions per day for commercial or non-commercial purposes
- ...5 more annotations...
Extracting Enterprise Vocabularies Using Linked Open Data | Semantic Web Dog Food - 0 views
-
A common vocabulary is vital to smooth business operation, yet codifying and maintaining an enterprise vocabulary is an arduous, manual task. We describe a process to automatically extract a domain specific vocabulary (terms and types) from unstructured data in the enterprise guided by term definitions in Linked Open Data (LOD). We validate our techniques by applying them to the IT (Information Technology) domain, taking 58 Gartner analyst reports and using two specific LOD sources -- DBpedia and Freebase.
-
This IBM article is referenced by Juan Sequeda in a post to the Linking Open Data mailing list (public-lod@w3.org, Feb 4, 2010) : Hi Matthias, We worked on something similar: entity type discovery using linked open data. Our project was given a corpus of documents in the same domain, identify specific entity types in the documents. Our objective was to search for documents in a corpus by specific entities. For example: "find articles that are about RDBMs" Standard NER tools identify high level types such as persons, organization, places because they have been previously trained on general corpora. I assume tools like OpenCalais have been trained on news-like documents and Zemanta has been trained on blog-like documents. We were interested in identifying specific types such a "RDBMS" when the word "Oracle" would show up in the text. In order to do that, we followed several domain term extraction techniques. We used LOD, specifically DBpedia, Freebase and Opencyc to disambiguate terms and also retrieve the entities. Honestly, evaluation is pretty hard to do, but our current implementation was not that bad (75% precision and 55% recall). We built upon some work by IBM where they create a vocabulary from text using LOD [1] Let me see if I can clean up the code and publish it as a service. [1] http://data.semanticweb.org/conference/iswc/2009/paper/inuse/143/html Juan Sequeda (575) SEQ-UEDA www.juansequeda.com
-
1 - 3 of 3
Showing 20▼ items per page