Group items tagged open - (HBSN) Useful Webservices APIs

Extracting Enterprise Vocabularies Using Linked Open Data | Semantic Web Dog Food - 0 views

data.semanticweb.org/...html

linkeddata Juan Sequeda entity extraction vocabularies

shared by François Dongier on 04 Feb 10 - Cached

A common vocabulary is vital to smooth business operation, yet codifying and maintaining an enterprise vocabulary is an arduous, manual task. We describe a process to automatically extract a domain specific vocabulary (terms and types) from unstructured data in the enterprise guided by term definitions in Linked Open Data (LOD). We validate our techniques by applying them to the IT (Information Technology) domain, taking 58 Gartner analyst reports and using two specific LOD sources -- DBpedia and Freebase.
- François Dongier on 04 Feb 10
  
  This IBM article is referenced by Juan Sequeda in a post to the Linking Open Data mailing list (public-lod@w3.org, Feb 4, 2010) : Hi Matthias, We worked on something similar: entity type discovery using linked open data. Our project was given a corpus of documents in the same domain, identify specific entity types in the documents. Our objective was to search for documents in a corpus by specific entities. For example: "find articles that are about RDBMs" Standard NER tools identify high level types such as persons, organization, places because they have been previously trained on general corpora. I assume tools like OpenCalais have been trained on news-like documents and Zemanta has been trained on blog-like documents. We were interested in identifying specific types such a "RDBMS" when the word "Oracle" would show up in the text. In order to do that, we followed several domain term extraction techniques. We used LOD, specifically DBpedia, Freebase and Opencyc to disambiguate terms and also retrieve the entities. Honestly, evaluation is pretty hard to do, but our current implementation was not that bad (75% precision and 55% recall). We built upon some work by IBM where they create a vocabulary from text using LOD [1] Let me see if I can clean up the code and publish it as a service. [1] http://data.semanticweb.org/conference/iswc/2009/paper/inuse/143/html Juan Sequeda (575) SEQ-UEDA www.juansequeda.com
  
  <div class="cArrow"> </div><div class="cContentInner">This IBM article is referenced by Juan Sequeda in a post to the Linking Open Data mailing list (public-lod@w3.org, Feb 4, 2010) : Hi Matthias, We worked on something similar: entity type discovery using linked open data. Our project was given a corpus of documents in the same domain, identify specific entity types in the documents. Our objective was to search for documents in a corpus by specific entities. For example: "find articles that are about RDBMs" Standard NER tools identify high level types such as persons, organization, places because they have been previously trained on general corpora. I assume tools like OpenCalais have been trained on news-like documents and Zemanta has been trained on blog-like documents. We were interested in identifying specific types such a "RDBMS" when the word "Oracle" would show up in the text. In order to do that, we followed several domain term extraction techniques. We used LOD, specifically DBpedia, Freebase and Opencyc to disambiguate terms and also retrieve the entities. Honestly, evaluation is pretty hard to do, but our current implementation was not that bad (75% precision and 55% recall). We built upon some work by IBM where they create a vocabulary from text using LOD [1] Let me see if I can clean up the code and publish it as a service. [1] <a href="http://data.semanticweb.org/conference/iswc/2009/paper/inuse/143/html" rel="nofollow" target="_blank">http://data.semanticweb.org/conference/iswc/2009/paper/inuse/143/html</a> Juan Sequeda (575) SEQ-UEDA <a href="http://www.juansequeda.com" rel="nofollow" target="_blank">www.juansequeda.com</a></div>
  
  ...
  
  Cancel
...

Cancel

How Google Buzz is Disruptive: Open Data Standards - 0 views

www.readwriteweb.com/...uptive_open_data_standards.php

googlebuzz dataportability

shared by François Dongier on 10 Feb 10 - Cached

Under the covers, though, this major product was built by a team of people taking a radical new approach to online publishing: Buzz is all about open, standardized user data.
...

Cancel
Google Buzz data can be syndicated out to other services using the standard data formats called Atom, Activity Streams, MediaRSS and PubSubHubbub.
...

Cancel
a look at its APIs and developer roadmap indicate that it may actually intend to be a platform - the central hub for a world of distributed social networking.
...

Cancel
...7 more annotations...
if the growing number of data portability and open web advocates the company has hired can do their jobs well - then Google Buzz could be a big force for good.
...

Cancel
People will build services on top of analyzing your public Buzz activity. They will build new applications for publishing to Buzz,
...

Cancel
Planned support for things like the Salmon commenting standard mean that comments left on Buzz could appear out on blog posts around the web, and comments on blog posts could be viewed inside of Buzz when the post links are shared.
...

Cancel
a cross-platform messaging service. Facebook users can only message other Facebook users
...

Cancel
Is Google centralizing too much of the decision making about the future of an ostensibly decentralized web?
...

Cancel
"Comin soon - Over the next several months Google Buzz will introduce an API for developers, including full/read write support for posts with the Atom Publishing Protocol, rich activity notification with Activity Streams, delegated authorization with OAuth, federated comments and activities with Salmon, distributed profile and contact information with WebFinger, and much, much more."
...

Cancel
It would have been disruptive if google had pushed W3C standards for sharing data (Semantic web technologies, LinkedData, ...). But does Google really want to push semantic web technologies, making the web easier to search ?
...

Cancel

FCC.gov reboots as an open government platform - O'Reilly Radar - 0 views

radar.oreilly.com/...-reboot-open-source-cloud.html

open gov fcc reboots

shared by Kurt Laitner on 06 Apr 11 - No Cached

Open Dover | add sentiment to your content - 0 views

www.opendover.nl

opendover opencalais sentiment analysis emotion tags drupal

shared by François Dongier on 02 Mar 10 - Cached

Emotion tag any kind of text with the Open Dover Live Demo, try OpenDover now!
...

Cancel
OpenDover uses linguistic algorithmic technologies to emotion tag text that you send to the service. Emotion tags are returned to users for implementing in web applications, searches, blogs and so on.
...

Cancel
Whether you are into blogging or developing websites, OpenDover is based on Java technology, which allows for easy connectivity through webservices.
...

Cancel

François Dongier on 02 Mar 10

Emailed newsletter March 2, 2010: Dear All, It is time again to inform you on the current state of our OpenDover project. Last 6 months we were engaged in some major overhaul activities. Since 1 year we are performing test trails, and are listening to "potential" customers. There was 1 thing they all asked for. Make OpenDover simpler! It appeared that the whole concept of choosing a subject domain, and selecting base tags, was to much. We also thought that this was hindering our penetration into the market. So, we went back to drawing board, and we have re-evaluated our system. While we were doing that, we kept on adding more subject domains, because whatever we would do, this approach of Ontology's and satellite words would not change. So, at least we can inform you now that we have a total of 10 subject domains, covering a large part of what is most commonly discussed on the Internet. Just to refresh your mind, we have listed them here for your convenience: 1. Economics, Finance, Business 2. Health - Medical Care 3. Law 4. Politics 5. Product - Camera 6. Product - Phone 7. Product - Audio Player 8. Product - Video Player 9. Product - Software 10. Travel - Flight 11. Travel - Hotel BREAKTHROUGH!! The biggest breakthrough came a few months ago when we modified our algorithms in such a way that we were able to auto-detect the subject domain of an arbitrary text. The next step was simple then. When we know the subject domain (or subject domains) of an arbitrary opinion text, we should automatically find the sentiment for that domain. It is then no longer necessary to use base tags. This feature is now available on OpenDover for you to test! 1. Just take an arbitrary piece of text expressing opinions (Or take the example listed in this e-mail) 2. Go to http://java.opinionmining.nl 3. Paste text into the story box 4. Select accurate in the Mode box 5. Select Generic domain in the Sub

<div class="cArrow"> </div><div class="cContentInner">Emailed newsletter March 2, 2010: Dear All, It is time again to inform you on the current state of our OpenDover project. Last 6 months we were engaged in some major overhaul activities. Since 1 year we are performing test trails, and are listening to "potential" customers. There was 1 thing they all asked for. Make OpenDover simpler! It appeared that the whole concept of choosing a subject domain, and selecting base tags, was to much. We also thought that this was hindering our penetration into the market. So, we went back to drawing board, and we have re-evaluated our system. While we were doing that, we kept on adding more subject domains, because whatever we would do, this approach of Ontology's and satellite words would not change. So, at least we can inform you now that we have a total of 10 subject domains, covering a large part of what is most commonly discussed on the Internet. Just to refresh your mind, we have listed them here for your convenience: 1. Economics, Finance, Business 2. Health - Medical Care 3. Law 4. Politics 5. Product - Camera 6. Product - Phone 7. Product - Audio Player 8. Product - Video Player 9. Product - Software 10. Travel - Flight 11. Travel - Hotel BREAKTHROUGH!! The biggest breakthrough came a few months ago when we modified our algorithms in such a way that we were able to auto-detect the subject domain of an arbitrary text. The next step was simple then. When we know the subject domain (or subject domains) of an arbitrary opinion text, we should automatically find the sentiment for that domain. It is then no longer necessary to use base tags. This feature is now available on OpenDover for you to test! 1. Just take an arbitrary piece of text expressing opinions (Or take the example listed in this e-mail) 2. Go to <a href="http://java.opinionmining.nl" rel="nofollow" target="_blank">http://java.opinionmining.nl</a> 3. Paste text into the story box 4. Select accurate in the Mode box 5. Select Generic domain in the Sub</div>

...

Cancel

TED Opens Up the Firehose of Data and Talks for Developers to Play [Exclusive] | Fast C... - 0 views

www.fastcompany.com/...-to-all-online-talks-exclusive

API TED

shared by Kurt Laitner on 14 Mar 11 - No Cached

A podcast conversation about GoodRelations, with Martin Hepp and Jamie Taylor | Paul Mi... - 0 views

cloudofdata.com/...h-martin-hepp-and-jamie-taylor

Paul Miller GoodRelations Martin Hepp Jamie Taylor

shared by François Dongier on 22 Feb 10 - Cached

“GoodRelations is a language that can be used to describe very precisely whatyour business is offering. Some people call GoodRelations a ‘data dictionary’, others prefer ’schema’ or ‘ontology’. But the name of the thing is not important. Important is that you can use GoodRelations to create a small data package that describes your productsand their features and prices, yourstores and opening hours, payment options and the like.
...

Cancel

Anything to Triples - - 0 views

developers.any23.org

any23 DERI Sindice Richard Cyganiak Michele Mostarda anything to triples

shared by François Dongier on 02 Mar 10 - Cached

Anything To Triples (any23) is a library and web service that extracts structured data in RDF format from a variety of Web documents. Currently it supports the following input formats: RDF/XML, Turtle, Notation 3 RDFa Microformats: Adr, Geo, hCalendar, hCard, hListing, hResume, hReview, License and XFN Any23 is used in major Web of Data applications such as sindice.com and sig.ma. It is written in Java and licensed under the Apache License. Any23 can be used in various ways: As a library in Java applications that consume structured data from the Web. As a command-line tool for extracting and converting between the supported formats. There is a web service and API where you can try it at any23.org.
...

Cancel
The original codebase comes from open-sourcing the "RDFizer" component of the Sindice search engine. The project is supported by DERI, NUI Galway, Web of Data - FBK and the OKKAM project (ICT-215032). Individual developers who have contributed to any23 include: Michele Catasta, Richard Cyganiak, Michele Mostarda, Davide Palmisano, Gabriele Renzi, Jürgen Umbrich.
...

Cancel

Social Translation : Using the WWL API To Build Multilingual Sites and Web Apps - O'Rei... - 0 views

broadcast.oreilly.com/...l-translation-using-the-w.html

WWL worldwidelexicon open source API_type_translation translation

shared by Kurt Laitner on 12 Jan 10 - Cached

Bigdata - 0 views

www.systap.com/bigdata.htm

BigData open source triple store

shared by Kurt Laitner on 04 Feb 10 - Cached

Social Graph API - Google Code - 0 views

code.google.com/socialgraph

google social graph api

shared by François Dongier on 09 Mar 10 - Cached

makes information about public connections between people easily available and useful.
...

Cancel
The API returns web addresses of public pages and publicly declared connections between them. The API cannot access non-public information, such as private profile pages or websites accessible to a limited group of friends.
...

Cancel
We currently index the public Web for XHTML Friends Network (XFN), Friend of a Friend (FOAF) markup and other publicly declared connections. By supporting open Web standards for describing connections between people, web sites can add to the social infrastructure of the web.
...

Cancel

Taking Search -- And Meaning -- Beyond English - Semantic Web - 0 views

www.semanticweb.com/...ning_beyond_english_155513.asp

Basis Technology Rosette entity extraction lucene

shared by François Dongier on 20 Mar 10 - Cached

Multi-lingual text analytics vendor Basis Technology Corp., which develops the Rosette linguistics platform
...

Cancel
The company this week released Rosette 7, the latest version of its software, which is used in major web and enterprise search engines, from Google to Bing to Oracle software. The product supports 55 languages for language identification, and if you count different encodings that grows to over 100 languages and encoding pairs. For base linguistics for search engine enablement it supports 20 languages, depending on how you count them.
...

Cancel
Another major feature in Rosette 7 is name matching and name translation, a problem the company has been working on for more than five years with the result that this is the first time name translation and searching are integrated into the Rosette platform’s same core set of APIs.
...

Cancel
...1 more annotation...
The latest version also now supports Lucene-based applications, so any organization using the open source search toolkits can get the same advanced linguistic processing used by high end web and enterprise search engines.
...

Cancel

Open-Source Large Vocabulary CSR Engine Julius - 0 views

julius.sourceforge.jp/en_index.php

voice to text api

shared by Kurt Laitner on 23 Sep 11 - Cached

Group items tagged