Group items tagged harvesting - sensemaking

IKHarvester - Informal Knowledge Harvester - 0 views

notitio.us/about.jsp

harvesting notitio notitio.us rdf rest

shared by Jack Park on 13 Aug 08 - Cached

Jack Park on 13 Aug 08

KHarvester (Informal Knowledge Harvester) is a SOA layer which collects RDF data from web pages. It provides REST based Web Services for managing data available on Social Semantic Information Sources (SSIS): semantic blogs, semantic wikis, and JeromeDL (the Social Semantic Digital Library). These Web Services allow saving harvested data in the informal knowledge repository, and providing them in a form of informal Learning Objects (LOs) that are described accroding to LOM (Learning Object Metadata) standard. Also, IKHarvester is an extension to Didaskon system. Didaskon (διδάσκω - gr. teach) delivers a framework for composing an on-demand curriculum from existing Learning Objects provided by e-Learning services (formal learning). Moreover, the system derives from SSIS which provide informal knowledge. Then, the selection and work-flow scheduling of Learning Objects is based on the semantically annotated specification of the user's current skills/knowledge (pre-conditions), anticipated resulting skills/knowledge (goal) and technical details of the clients platform.

<div class="cArrow"> </div><div class="cContentInner">KHarvester (Informal Knowledge Harvester) is a SOA layer which collects RDF data from web pages. It provides REST based Web Services for managing data available on Social Semantic Information Sources (SSIS): semantic blogs, semantic wikis, and JeromeDL (the Social Semantic Digital Library). These Web Services allow saving harvested data in the informal knowledge repository, and providing them in a form of informal Learning Objects (LOs) that are described accroding to LOM (Learning Object Metadata) standard. Also, IKHarvester is an extension to Didaskon system. Didaskon (διδάσκω - gr. teach) delivers a framework for composing an on-demand curriculum from existing Learning Objects provided by e-Learning services (formal learning). Moreover, the system derives from SSIS which provide informal knowledge. Then, the selection and work-flow scheduling of Learning Objects is based on the semantically annotated specification of the user's current skills/knowledge (pre-conditions), anticipated resulting skills/knowledge (goal) and technical details of the clients platform.</div>

...

Cancel

HarvANA - 0 views

www.itee.uq.edu.au/...index.html

aggregating annotating harvana harvesting metadata oai-pmh rdf tags

shared by Jack Park on 16 Jun 08 - Cached

Jack Park on 16 Jun 08

HarvANA uses a standardized but extensible RDF model for representing the annotations/tags and OAI-PMH to harvest the annotations/tags from distributed community servers. The harvested annotations are aggregated with the authoritative metadata in a centralized metadata store.

<div class="cArrow"> </div><div class="cContentInner">HarvANA uses a standardized but extensible RDF model for representing the annotations/tags and OAI-PMH to harvest the annotations/tags from distributed community servers. The harvested annotations are aggregated with the authoritative metadata in a centralized metadata store.</div>

...

Cancel

YAGO-NAGA - D5: Databases and Information Systems (Max-Planck-Institut für In... - 0 views

www.mpi-inf.mpg.de/yago-naga

yago naga search semantic opensource knowledge discovery wordnet wikipedia

shared by Jack Park on 27 Apr 09 - Cached

Jack Park on 27 Apr 09

The YAGO-NAGA project started in 2006 with the goal of building a conveniently searchable, large-scale, highly accurate knowledge base of common facts in a machine-processible representation. We have already harvested knowledge about millions of entities and facts about their relationships, from Wikipedia and WordNet with careful integration of these two sources. The resulting knowledge base, coined YAGO, has very high precision and is freely available. The facts are represented as RDF triples, and we have developed methods and prototype systems for querying, ranking, and exploring knowledge. Our search engine NAGA provides ranked answers to queries based on statistical models.

<div class="cArrow"> </div><div class="cContentInner">The YAGO-NAGA project started in 2006 with the goal of building a conveniently searchable, large-scale, highly accurate knowledge base of common facts in a machine-processible representation. We have already harvested knowledge about millions of entities and facts about their relationships, from Wikipedia and WordNet with careful integration of these two sources. The resulting knowledge base, coined YAGO, has very high precision and is freely available. The facts are represented as RDF triples, and we have developed methods and prototype systems for querying, ranking, and exploring knowledge. Our search engine NAGA provides ranked answers to queries based on statistical models.</div>

...

Cancel

danbri's foaf stories » OpenSocial schema extraction: via Javascript to RDF/OWL - 0 views

danbri.org/349

harvesting javascript opensocial opensource owl rdf

shared by Jack Park on 31 Aug 08 - Cached

Jack Park on 31 Aug 08

OpenSocial's API reference describes a number of classes ('Person', 'Name', 'Email', 'Phone', 'Url', 'Organization', 'Address', 'Message', 'Activity', 'MediaItem', 'Activity', …), each of which has various properties whose values are either strings, references to instances of other classes, or enumerations. I'd like to make them usable beyond the confines of OpenSocial, so I'm making an RDF/OWL version. OpenSocial's schema is an attempt to provide an overarching model for much of present-day mainstream 'social networking' functionality, including dating, jobs etc. Such a broad effort is inevitably somewhat open-ended, and so may benefit from being linked to data from other complementary sources.

<div class="cArrow"> </div><div class="cContentInner">OpenSocial's API reference describes a number of classes ('Person', 'Name', 'Email', 'Phone', 'Url', 'Organization', 'Address', 'Message', 'Activity', 'MediaItem', 'Activity', …), each of which has various properties whose values are either strings, references to instances of other classes, or enumerations. I'd like to make them usable beyond the confines of OpenSocial, so I'm making an RDF/OWL version. OpenSocial's schema is an attempt to provide an overarching model for much of present-day mainstream 'social networking' functionality, including dating, jobs etc. Such a broad effort is inevitably somewhat open-ended, and so may benefit from being linked to data from other complementary sources.</div>

...

Cancel

wiki.dbpedia.org : Documentation - 0 views

wiki.dbpedia.org/Documentation

dbpedia dbpedia.org gpl harvesting php textmining wiki.dbpedia.org

shared by Jack Park on 12 Sep 08 - Cached

Jack Park on 12 Sep 08

The DBpedia community uses a flexible and extensible framework to extract different kinds of structured information from Wikipedia. The DBpedia information extraction framework is written using PHP 5. The framework is available from the DBpedia SVN (GNU GPL License).

<div class="cArrow"> </div><div class="cContentInner">The DBpedia community uses a flexible and extensible framework to extract different kinds of structured information from Wikipedia. The DBpedia information extraction framework is written using PHP 5. The framework is available from the DBpedia SVN (GNU GPL License).</div>

...

Cancel

Apache UIMA - Apache UIMA - 0 views

incubator.apache.org/uima

uima nlp unstructured textmining TextMining harvesting discovery opensource apache

shared by Jack Park on 18 Nov 08 - Cached

Jack Park on 18 Nov 08

Unstructured Information Management applications are software systems that analyze large volumes of unstructured information in order to discover knowledge that is relevant to an end user. UIMA is a framework and SDK for developing such applications. An example UIM application might ingest plain text and identify entities, such as persons, places, organizations; or relations, such as works-for or located-at. UIMA enables such an application to be decomposed into components, for example "language identification" -> "language specific segmentation" -> "sentence boundary detection" -> "entity detection (person/place names etc.)". Each component must implement interfaces defined by the framework and must provide self-describing metadata via XML descriptor files. The framework manages these components and the data flow between them. Components are written in Java or C++; the data that flows between components is designed for efficient mapping between these languages. UIMA additionally provides capabilities to wrap components as network services, and can scale to very large volumes by replicating processing pipelines over a cluster of networked nodes.

<div class="cArrow"> </div><div class="cContentInner">Unstructured Information Management applications are software systems that analyze large volumes of unstructured information in order to discover knowledge that is relevant to an end user. UIMA is a framework and SDK for developing such applications. An example UIM application might ingest plain text and identify entities, such as persons, places, organizations; or relations, such as works-for or located-at. UIMA enables such an application to be decomposed into components, for example "language identification" -> "language specific segmentation" -> "sentence boundary detection" -> "entity detection (person/place names etc.)". Each component must implement interfaces defined by the framework and must provide self-describing metadata via XML descriptor files. The framework manages these components and the data flow between them. Components are written in Java or C++; the data that flows between components is designed for efficient mapping between these languages. UIMA additionally provides capabilities to wrap components as network services, and can scale to very large volumes by replicating processing pipelines over a cluster of networked nodes. </div>

...

Cancel

Text Analytics Solutions from ClearForest - 0 views

clearforest.com

clearforest TextMining harvesting analysis

shared by Jack Park on 16 Dec 08 - Cached

alphaWorks : Text Analytics Tools and Runtime for IBM LanguageWare : Overview - 0 views

www.alphaworks.ibm.com/lrw

languageware ibm TextMining uima harvesting discovery

shared by Jack Park on 18 Nov 08 - Cached

Jack Park on 18 Nov 08

IBM® LanguageWare® is a set of run-time libraries and an easy-to-use Eclipse-based development environment for building custom text analyzers in various languages. Deployable in Apache UIMA, these analyzers can expose the information buried in text to any application. The Eclipse-based tools makes creating analyzers simple and fast, even for non-technical users. The tools make it easy to build dictionaries, ontologies, and rules for identifying key information, relationships and meaning.

<div class="cArrow"> </div><div class="cContentInner">IBM® LanguageWare® is a set of run-time libraries and an easy-to-use Eclipse-based development environment for building custom text analyzers in various languages. Deployable in Apache UIMA, these analyzers can expose the information buried in text to any application. The Eclipse-based tools makes creating analyzers simple and fast, even for non-technical users. The tools make it easy to build dictionaries, ontologies, and rules for identifying key information, relationships and meaning. </div>

...

Cancel

Java Text Categorizing Library - 0 views

textcat.sourceforge.net

categorizing TextMining information extraction harvesting library opensource java lgpl

shared by Jack Park on 02 Dec 08 - Cached

Jack Park on 02 Dec 08

The Java Text Categorizing Library (JTCL) is a pure java 1.5 implementation of libTextCat which in turn is "a library that was primarily developed for language guessing, a task on which it is known to perform with near-perfect accuracy". It's distributed under the LGPL and can also be used in order to categorize text into arbitrary topics by computing appropiate fingerprints which represent the categories.

<div class="cArrow"> </div><div class="cContentInner">The Java Text Categorizing Library (JTCL) is a pure java 1.5 implementation of libTextCat which in turn is "a library that was primarily developed for language guessing, a task on which it is known to perform with near-perfect accuracy". It's distributed under the LGPL and can also be used in order to categorize text into arbitrary topics by computing appropiate fingerprints which represent the categories.</div>

...

Cancel

Raw stock - 20 views

Hi there - For a number of years I surveyed a field that could be called "sense making" ... concept mapping, citizen journalism, e-democracy, Web-DAV ... that whole ball of wax. My http://gnodal.l...

bookmarks

started by Bernard (ben) Tremblay on 09 Nov 08 no follow-up yet

Technology Review: Extracting Meaning from Millions of Pages - 0 views

beta.technologyreview.com/...22773

TextRunner TextMining harvesting

shared by Jack Park on 12 Jun 09 - Cached

Jack Park on 12 Jun 09

A software engine that pulls together facts by combing through more than 500 million Web pages has been developed by researchers at the University of Washington. The tool extracts information from billions of lines of text by analyzing basic relationships between words.

<div class="cArrow"> </div><div class="cContentInner">A software engine that pulls together facts by combing through more than 500 million Web pages has been developed by researchers at the University of Washington. The tool extracts information from billions of lines of text by analyzing basic relationships between words.</div>

...

Cancel

IT Conversations | Jon Udell's Interviews with Innovators | Seth Grimes (Free Podcast) - 0 views

itc.conversationsnetwork.org/...detail4061.html

udell interview mp3 harvesting discovery

shared by Jack Park on 26 May 09 - Cached

Jack Park on 26 May 09

Seth Grimes is a business intelligence expert with a special interest in text analytics. In this conversation with host Jon Udell, he discusses how a new breed of tools is enabling companies to build "voice of the customer" applications that extract useful signals from the noisy chatter that's erupting everywhere online.

<div class="cArrow"> </div><div class="cContentInner">Seth Grimes is a business intelligence expert with a special interest in text analytics. In this conversation with host Jon Udell, he discusses how a new breed of tools is enabling companies to build "voice of the customer" applications that extract useful signals from the noisy chatter that's erupting everywhere online.</div>

...

Cancel

The Lemur Toolkit for Language Modeling and Information Retrieval - 0 views

lemurproject.org

lemur TextMining harvesting text toolkit opensource bsd c++ java