Skip to main content

Home/ sensemaking/ Group items tagged textmining

Rss Feed Group items tagged

Jack Park

Apache UIMA - Apache UIMA - 0 views

  •  
    Unstructured Information Management applications are software systems that analyze large volumes of unstructured information in order to discover knowledge that is relevant to an end user. UIMA is a framework and SDK for developing such applications. An example UIM application might ingest plain text and identify entities, such as persons, places, organizations; or relations, such as works-for or located-at. UIMA enables such an application to be decomposed into components, for example "language identification" -> "language specific segmentation" -> "sentence boundary detection" -> "entity detection (person/place names etc.)". Each component must implement interfaces defined by the framework and must provide self-describing metadata via XML descriptor files. The framework manages these components and the data flow between them. Components are written in Java or C++; the data that flows between components is designed for efficient mapping between these languages. UIMA additionally provides capabilities to wrap components as network services, and can scale to very large volumes by replicating processing pipelines over a cluster of networked nodes.
Jack Park

A Unified Tagging Approach to Text Normalization - 1 views

  •  
    This paper addresses the issue of text normalization, an important yet often overlooked problem in natural language processing. By text normalization, we mean converting 'informally inputted' text into the canonical form, by eliminating 'noises' in the text and detecting paragraph and sentence boundaries in the text.
Jack Park

wiki.dbpedia.org : Documentation - 0 views

  •  
    The DBpedia community uses a flexible and extensible framework to extract different kinds of structured information from Wikipedia. The DBpedia information extraction framework is written using PHP 5. The framework is available from the DBpedia SVN (GNU GPL License).
Jack Park

SenseBot - semantic search engine that finds sense on the Web - 0 views

  •  
    SenseBot (www.sensebot.net) represents a new type of Search Engine that delivers a summary in response to your search query instead of a collection of links to Web pages. SenseBot parses top results returned by a major Web search engine (e.g., Google) and prepares a text summary of them. The summary serves as a digest on the topic of your query, blending together the most significant and relevant aspects of the search results. The summary itself becomes the main result of your search.
Jack Park

Alchemy - Open Source AI - 0 views

  •  
    Alchemy is a software package providing a series of algorithms for statistical relational learning and probabilistic logic inference, based on the Markov logic representation. Alchemy allows you to easily develop a wide range of AI applications, including: * Collective classification * Link prediction * Entity resolution * Social network modeling * Information extraction
Jack Park

alphaWorks : Text Analytics Tools and Runtime for IBM LanguageWare : Overview - 0 views

  •  
    IBM® LanguageWare® is a set of run-time libraries and an easy-to-use Eclipse-based development environment for building custom text analyzers in various languages. Deployable in Apache UIMA, these analyzers can expose the information buried in text to any application. The Eclipse-based tools makes creating analyzers simple and fast, even for non-technical users. The tools make it easy to build dictionaries, ontologies, and rules for identifying key information, relationships and meaning.
Jack Park

uClassify - free text classifier web service - 0 views

  •  
    uClassify is a free web service where you can easily create your own text classifiers.
Jack Park

The Stanford Wordnet Project - 0 views

  •  
    Augmented Wordnets These lexical resources (and the method of their construction) are described in Semantic Taxonomy Induction from Heterogenous Evidence (ACL-06). They are automatically augmented versions of WordNet 2.1
Jack Park

TAPIR project web site - 0 views

  •  
    TAPIR started up as a research project in June 2001. In 2002 the project is sponsored by NORDINFO and the Research Council of The Danish Ministry of Culture. TAPIR aims at investigating the potentials of applying the diversity of cognitive representations pointing to scientific full-text documents following the principle of poly-representation. Poly-representation (or multi evidence) implies to utilize the cognitively different overlapping interpretations, also over time, made by different actors participating in interactive IR. Such cognitive overlaps derive, for instance, from the authors own perceptions of their work (titles, full-text terms), from human indexing (e.g. descriptors), or from citations given to the work by other authors. The assumption is that the more cognitively different the representations simultaneously pointing to a document are, the higher is the probability that the document is relevant to a given set of criteria.
Jack Park

LingPipe Home - 0 views

  •  
    LingPipe is a suite of Java libraries for the linguistic analysis of human language.
Jack Park

Java Text Categorizing Library - 0 views

  •  
    The Java Text Categorizing Library (JTCL) is a pure java 1.5 implementation of libTextCat which in turn is "a library that was primarily developed for language guessing, a task on which it is known to perform with near-perfect accuracy". It's distributed under the LGPL and can also be used in order to categorize text into arbitrary topics by computing appropiate fingerprints which represent the categories.
Jack Park

The Lemur Toolkit for Language Modeling and Information Retrieval - 0 views

  •  
    The Lemur Toolkit is a open-source toolkit designed to facilitate research in language modeling and information retrieval. Lemur supports a wide range of industrial and research language applications such as ad-hoc retrieval, site-search, and text mining. The toolkit supports indexing of large-scale text databases, the construction of simple language models for documents, queries, or subcollections, and the implementation of retrieval systems based on language models as well as a variety of other retrieval models. The system is written in the C and C++ languages, and is designed as a research system to run under Unix operating systems, although it can also run under Windows.
Jack Park

UIMA COMPONENT REPOSITORY - 0 views

  •  
    Our goal in creating this site is to provide the basis for a thriving community of UIMA developers who can announce, discuss, design, share, and critique UIMA-compliant components, resources and solutions. The Unstructured Information Management Architecture (UIMA) is a software framework that supports rapid development and deployment of multimodal analytics - applications which provide value by processing human-readable text, audio and/or video in order to extract information, answer questions, summarize documents, etc.
Jack Park

Technology Review: Extracting Meaning from Millions of Pages - 0 views

  •  
    A software engine that pulls together facts by combing through more than 500 million Web pages has been developed by researchers at the University of Washington. The tool extracts information from billions of lines of text by analyzing basic relationships between words.
Jack Park

x2exp.pdf (application/pdf Object) - 0 views

  •  
    But invariably, simple models and a lot of data trump more elaborate models based on less data."
Jack Park

The Lemur Toolkit for Language Modeling and Information Retrieval - 0 views

  •  
    The Lemur Toolkit is a open-source toolkit designed to facilitate research in language modeling and information retrieval. Lemur supports a wide range of industrial and research language applications such as ad-hoc retrieval, site-search, and text mining. The toolkit supports indexing of large-scale text databases, the construction of simple language models for documents, queries, or subcollections, and the implementation of retrieval systems based on language models as well as a variety of other retrieval models. The system is written in the C and C++ languages, and is designed as a research system to run under Unix operating systems, although it can also run under Windows.
Jack Park

Semantic API - 0 views

  •  
    Semantic Cloud is the API that powers semantic search engine SenseBot and contextual linking tool for bloggers LinkSensor. The API supports SOAP and REST protocols (HTTP GET). The idea is to empower semantic startups or any ventures that are looking to utilize an affordable high-quality semantic solution to build their applications. Semantic API features include: * extraction of semantic concepts from a page or document; * creating a "semantic cloud" of concepts describing a group of documents; * generating a multi-document summary of a set of pages; * generating an essay on a topic based on a set of documents. Multiple parameters allow the client to control the type and format of results.
Jack Park

SCRIBO - Welcome to SCRIBO.ws - 0 views

  •  
    SCRIBO - Semi-automatic and Collaborative Retrieval of Information Based on Ontologies - aims at algorithms and collaborative free software for the automatic extraction of knowledge from texts and images, and for the semi-automatic annotation of digital documents.
1 - 19 of 19
Showing 20 items per page