Skip to main content

Home/ sensemaking/ Group items tagged harvesting

Rss Feed Group items tagged

Jack Park

IKHarvester - Informal Knowledge Harvester - 0 views

  •  
    KHarvester (Informal Knowledge Harvester) is a SOA layer which collects RDF data from web pages. It provides REST based Web Services for managing data available on Social Semantic Information Sources (SSIS): semantic blogs, semantic wikis, and JeromeDL (the Social Semantic Digital Library). These Web Services allow saving harvested data in the informal knowledge repository, and providing them in a form of informal Learning Objects (LOs) that are described accroding to LOM (Learning Object Metadata) standard. Also, IKHarvester is an extension to Didaskon system. Didaskon (διδάσκω - gr. teach) delivers a framework for composing an on-demand curriculum from existing Learning Objects provided by e-Learning services (formal learning). Moreover, the system derives from SSIS which provide informal knowledge. Then, the selection and work-flow scheduling of Learning Objects is based on the semantically annotated specification of the user's current skills/knowledge (pre-conditions), anticipated resulting skills/knowledge (goal) and technical details of the clients platform.
Jack Park

HarvANA - 0 views

  •  
    HarvANA uses a standardized but extensible RDF model for representing the annotations/tags and OAI-PMH to harvest the annotations/tags from distributed community servers. The harvested annotations are aggregated with the authoritative metadata in a centralized metadata store.
Jack Park

YAGO-NAGA - D5: Databases and Information Systems (Max-Planck-Institut für In... - 0 views

  •  
    The YAGO-NAGA project started in 2006 with the goal of building a conveniently searchable, large-scale, highly accurate knowledge base of common facts in a machine-processible representation. We have already harvested knowledge about millions of entities and facts about their relationships, from Wikipedia and WordNet with careful integration of these two sources. The resulting knowledge base, coined YAGO, has very high precision and is freely available. The facts are represented as RDF triples, and we have developed methods and prototype systems for querying, ranking, and exploring knowledge. Our search engine NAGA provides ranked answers to queries based on statistical models.
Jack Park

danbri's foaf stories » OpenSocial schema extraction: via Javascript to RDF/OWL - 0 views

  •  
    OpenSocial's API reference describes a number of classes ('Person', 'Name', 'Email', 'Phone', 'Url', 'Organization', 'Address', 'Message', 'Activity', 'MediaItem', 'Activity', …), each of which has various properties whose values are either strings, references to instances of other classes, or enumerations. I'd like to make them usable beyond the confines of OpenSocial, so I'm making an RDF/OWL version. OpenSocial's schema is an attempt to provide an overarching model for much of present-day mainstream 'social networking' functionality, including dating, jobs etc. Such a broad effort is inevitably somewhat open-ended, and so may benefit from being linked to data from other complementary sources.
Jack Park

wiki.dbpedia.org : Documentation - 0 views

  •  
    The DBpedia community uses a flexible and extensible framework to extract different kinds of structured information from Wikipedia. The DBpedia information extraction framework is written using PHP 5. The framework is available from the DBpedia SVN (GNU GPL License).
Jack Park

Apache UIMA - Apache UIMA - 0 views

  •  
    Unstructured Information Management applications are software systems that analyze large volumes of unstructured information in order to discover knowledge that is relevant to an end user. UIMA is a framework and SDK for developing such applications. An example UIM application might ingest plain text and identify entities, such as persons, places, organizations; or relations, such as works-for or located-at. UIMA enables such an application to be decomposed into components, for example "language identification" -> "language specific segmentation" -> "sentence boundary detection" -> "entity detection (person/place names etc.)". Each component must implement interfaces defined by the framework and must provide self-describing metadata via XML descriptor files. The framework manages these components and the data flow between them. Components are written in Java or C++; the data that flows between components is designed for efficient mapping between these languages. UIMA additionally provides capabilities to wrap components as network services, and can scale to very large volumes by replicating processing pipelines over a cluster of networked nodes.
Jack Park

alphaWorks : Text Analytics Tools and Runtime for IBM LanguageWare : Overview - 0 views

  •  
    IBM® LanguageWare® is a set of run-time libraries and an easy-to-use Eclipse-based development environment for building custom text analyzers in various languages. Deployable in Apache UIMA, these analyzers can expose the information buried in text to any application. The Eclipse-based tools makes creating analyzers simple and fast, even for non-technical users. The tools make it easy to build dictionaries, ontologies, and rules for identifying key information, relationships and meaning.
Jack Park

Java Text Categorizing Library - 0 views

  •  
    The Java Text Categorizing Library (JTCL) is a pure java 1.5 implementation of libTextCat which in turn is "a library that was primarily developed for language guessing, a task on which it is known to perform with near-perfect accuracy". It's distributed under the LGPL and can also be used in order to categorize text into arbitrary topics by computing appropiate fingerprints which represent the categories.
Bernard (ben) Tremblay

Raw stock - 20 views

Hi there - For a number of years I surveyed a field that could be called "sense making" ... concept mapping, citizen journalism, e-democracy, Web-DAV ... that whole ball of wax. My http://gnodal.l...

bookmarks

started by Bernard (ben) Tremblay on 09 Nov 08 no follow-up yet
Jack Park

Technology Review: Extracting Meaning from Millions of Pages - 0 views

  •  
    A software engine that pulls together facts by combing through more than 500 million Web pages has been developed by researchers at the University of Washington. The tool extracts information from billions of lines of text by analyzing basic relationships between words.
Jack Park

IT Conversations | Jon Udell's Interviews with Innovators | Seth Grimes (Free Podcast) - 0 views

  •  
    Seth Grimes is a business intelligence expert with a special interest in text analytics. In this conversation with host Jon Udell, he discusses how a new breed of tools is enabling companies to build "voice of the customer" applications that extract useful signals from the noisy chatter that's erupting everywhere online.
Jack Park

The Lemur Toolkit for Language Modeling and Information Retrieval - 0 views

  •  
    The Lemur Toolkit is a open-source toolkit designed to facilitate research in language modeling and information retrieval. Lemur supports a wide range of industrial and research language applications such as ad-hoc retrieval, site-search, and text mining. The toolkit supports indexing of large-scale text databases, the construction of simple language models for documents, queries, or subcollections, and the implementation of retrieval systems based on language models as well as a variety of other retrieval models. The system is written in the C and C++ languages, and is designed as a research system to run under Unix operating systems, although it can also run under Windows.
1 - 13 of 13
Showing 20 items per page