Group items tagged textmining - sensemaking

Apache UIMA - Apache UIMA - 0 views

incubator.apache.org/uima

uima nlp unstructured textmining TextMining harvesting discovery opensource apache

shared by Jack Park on 18 Nov 08 - Cached

Jack Park on 18 Nov 08

Unstructured Information Management applications are software systems that analyze large volumes of unstructured information in order to discover knowledge that is relevant to an end user. UIMA is a framework and SDK for developing such applications. An example UIM application might ingest plain text and identify entities, such as persons, places, organizations; or relations, such as works-for or located-at. UIMA enables such an application to be decomposed into components, for example "language identification" -> "language specific segmentation" -> "sentence boundary detection" -> "entity detection (person/place names etc.)". Each component must implement interfaces defined by the framework and must provide self-describing metadata via XML descriptor files. The framework manages these components and the data flow between them. Components are written in Java or C++; the data that flows between components is designed for efficient mapping between these languages. UIMA additionally provides capabilities to wrap components as network services, and can scale to very large volumes by replicating processing pipelines over a cluster of networked nodes.

<div class="cArrow"> </div><div class="cContentInner">Unstructured Information Management applications are software systems that analyze large volumes of unstructured information in order to discover knowledge that is relevant to an end user. UIMA is a framework and SDK for developing such applications. An example UIM application might ingest plain text and identify entities, such as persons, places, organizations; or relations, such as works-for or located-at. UIMA enables such an application to be decomposed into components, for example "language identification" -> "language specific segmentation" -> "sentence boundary detection" -> "entity detection (person/place names etc.)". Each component must implement interfaces defined by the framework and must provide self-describing metadata via XML descriptor files. The framework manages these components and the data flow between them. Components are written in Java or C++; the data that flows between components is designed for efficient mapping between these languages. UIMA additionally provides capabilities to wrap components as network services, and can scale to very large volumes by replicating processing pipelines over a cluster of networked nodes. </div>

...

Cancel

A Unified Tagging Approach to Text Normalization - 1 views

keg.cs.tsinghua.edu.cn/...u-et-al-Text-Normalization.pdf

conditional-random-field nlp paper pdf tagging textmining

shared by Jack Park on 23 Sep 08 - No Cached

Jack Park on 23 Sep 08

This paper addresses the issue of text normalization, an important yet often overlooked problem in natural language processing. By text normalization, we mean converting 'informally inputted' text into the canonical form, by eliminating 'noises' in the text and detecting paragraph and sentence boundaries in the text.

<div class="cArrow"> </div><div class="cContentInner">This paper addresses the issue of text normalization, an important yet often overlooked problem in natural language processing. By text normalization, we mean converting 'informally inputted' text into the canonical form, by eliminating 'noises' in the text and detecting paragraph and sentence boundaries in the text.</div>

...

Cancel

wiki.dbpedia.org : Documentation - 0 views

wiki.dbpedia.org/Documentation

dbpedia dbpedia.org gpl harvesting php textmining wiki.dbpedia.org

shared by Jack Park on 12 Sep 08 - Cached

Jack Park on 12 Sep 08

The DBpedia community uses a flexible and extensible framework to extract different kinds of structured information from Wikipedia. The DBpedia information extraction framework is written using PHP 5. The framework is available from the DBpedia SVN (GNU GPL License).

<div class="cArrow"> </div><div class="cContentInner">The DBpedia community uses a flexible and extensible framework to extract different kinds of structured information from Wikipedia. The DBpedia information extraction framework is written using PHP 5. The framework is available from the DBpedia SVN (GNU GPL License).</div>

...

Cancel

SenseBot - semantic search engine that finds sense on the Web - 0 views

www.sensebot.net

search searchengine semantic sensebot summary textmining

shared by Jack Park on 31 Aug 08 - Cached

Jack Park on 31 Aug 08

SenseBot (www.sensebot.net) represents a new type of Search Engine that delivers a summary in response to your search query instead of a collection of links to Web pages. SenseBot parses top results returned by a major Web search engine (e.g., Google) and prepares a text summary of them. The summary serves as a digest on the topic of your query, blending together the most significant and relevant aspects of the search results. The summary itself becomes the main result of your search.

<div class="cArrow"> </div><div class="cContentInner">SenseBot (<a href="http://www.sensebot.net" rel="nofollow" target="_blank">www.sensebot.net</a>) represents a new type of Search Engine that delivers a summary in response to your search query instead of a collection of links to Web pages. SenseBot parses top results returned by a major Web search engine (e.g., Google) and prepares a text summary of them. The summary serves as a digest on the topic of your query, blending together the most significant and relevant aspects of the search results. The summary itself becomes the main result of your search.</div>

...

Cancel

Alchemy - Open Source AI - 0 views

alchemy.cs.washington.edu

alchemy markovlogic software opensource c++ knowledge discovery TextMining

shared by Jack Park on 15 Jan 09 - Cached

Jack Park on 15 Jan 09

Alchemy is a software package providing a series of algorithms for statistical relational learning and probabilistic logic inference, based on the Markov logic representation. Alchemy allows you to easily develop a wide range of AI applications, including: * Collective classification * Link prediction * Entity resolution * Social network modeling * Information extraction

<div class="cArrow"> </div><div class="cContentInner">Alchemy is a software package providing a series of algorithms for statistical relational learning and probabilistic logic inference, based on the Markov logic representation. Alchemy allows you to easily develop a wide range of AI applications, including: * Collective classification * Link prediction * Entity resolution * Social network modeling * Information extraction </div>

...

Cancel

Text Analytics Solutions from ClearForest - 0 views

clearforest.com

clearforest TextMining harvesting analysis

shared by Jack Park on 16 Dec 08 - Cached

alphaWorks : Text Analytics Tools and Runtime for IBM LanguageWare : Overview - 0 views

www.alphaworks.ibm.com/lrw

languageware ibm TextMining uima harvesting discovery

shared by Jack Park on 18 Nov 08 - Cached

Jack Park on 18 Nov 08

IBM® LanguageWare® is a set of run-time libraries and an easy-to-use Eclipse-based development environment for building custom text analyzers in various languages. Deployable in Apache UIMA, these analyzers can expose the information buried in text to any application. The Eclipse-based tools makes creating analyzers simple and fast, even for non-technical users. The tools make it easy to build dictionaries, ontologies, and rules for identifying key information, relationships and meaning.

<div class="cArrow"> </div><div class="cContentInner">IBM® LanguageWare® is a set of run-time libraries and an easy-to-use Eclipse-based development environment for building custom text analyzers in various languages. Deployable in Apache UIMA, these analyzers can expose the information buried in text to any application. The Eclipse-based tools makes creating analyzers simple and fast, even for non-technical users. The tools make it easy to build dictionaries, ontologies, and rules for identifying key information, relationships and meaning. </div>

...

Cancel

uClassify - free text classifier web service - 0 views

uclassify.com

uclassify TextMining classification

shared by Jack Park on 16 Dec 08 - Cached

Jack Park on 16 Dec 08

uClassify is a free web service where you can easily create your own text classifiers.

<div class="cArrow"> </div><div class="cContentInner">uClassify is a free web service where you can easily create your own text classifiers.</div>

...

Cancel

The Stanford Wordnet Project - 0 views

ai.stanford.edu/swn

wordnet synsets text TextMining

shared by Jack Park on 09 Jan 09 - Cached

Jack Park on 09 Jan 09

Augmented Wordnets These lexical resources (and the method of their construction) are described in Semantic Taxonomy Induction from Heterogenous Evidence (ACL-06). They are automatically augmented versions of WordNet 2.1

<div class="cArrow"> </div><div class="cContentInner">Augmented Wordnets These lexical resources (and the method of their construction) are described in Semantic Taxonomy Induction from Heterogenous Evidence (ACL-06). They are automatically augmented versions of WordNet 2.1 </div>

...

Cancel

TAPIR project web site - 0 views

project.dbit.dk/tapir

tapir information extraction discovery TextMining

shared by Jack Park on 31 Dec 08 - Cached

Jack Park on 31 Dec 08

TAPIR started up as a research project in June 2001. In 2002 the project is sponsored by NORDINFO and the Research Council of The Danish Ministry of Culture. TAPIR aims at investigating the potentials of applying the diversity of cognitive representations pointing to scientific full-text documents following the principle of poly-representation. Poly-representation (or multi evidence) implies to utilize the cognitively different overlapping interpretations, also over time, made by different actors participating in interactive IR. Such cognitive overlaps derive, for instance, from the authors own perceptions of their work (titles, full-text terms), from human indexing (e.g. descriptors), or from citations given to the work by other authors. The assumption is that the more cognitively different the representations simultaneously pointing to a document are, the higher is the probability that the document is relevant to a given set of criteria.

<div class="cArrow"> </div><div class="cContentInner">TAPIR started up as a research project in June 2001. In 2002 the project is sponsored by NORDINFO and the Research Council of The Danish Ministry of Culture. TAPIR aims at investigating the potentials of applying the diversity of cognitive representations pointing to scientific full-text documents following the principle of poly-representation. Poly-representation (or multi evidence) implies to utilize the cognitively different overlapping interpretations, also over time, made by different actors participating in interactive IR. Such cognitive overlaps derive, for instance, from the authors own perceptions of their work (titles, full-text terms), from human indexing (e.g. descriptors), or from citations given to the work by other authors. The assumption is that the more cognitively different the representations simultaneously pointing to a document are, the higher is the probability that the document is relevant to a given set of criteria.</div>

...

Cancel

LingPipe Home - 0 views

alias-i.com/lingpipe

computational linguistics discovery information extraction java opensource textmining

shared by Jack Park on 11 Jul 08 - Cached

Jack Park on 11 Jul 08

LingPipe is a suite of Java libraries for the linguistic analysis of human language.

<div class="cArrow"> </div><div class="cContentInner">LingPipe is a suite of Java libraries for the linguistic analysis of human language. </div>

...

Cancel

Java Text Categorizing Library - 0 views

textcat.sourceforge.net

categorizing TextMining information extraction harvesting library opensource java lgpl

shared by Jack Park on 02 Dec 08 - Cached

Jack Park on 02 Dec 08

The Java Text Categorizing Library (JTCL) is a pure java 1.5 implementation of libTextCat which in turn is "a library that was primarily developed for language guessing, a task on which it is known to perform with near-perfect accuracy". It's distributed under the LGPL and can also be used in order to categorize text into arbitrary topics by computing appropiate fingerprints which represent the categories.

<div class="cArrow"> </div><div class="cContentInner">The Java Text Categorizing Library (JTCL) is a pure java 1.5 implementation of libTextCat which in turn is "a library that was primarily developed for language guessing, a task on which it is known to perform with near-perfect accuracy". It's distributed under the LGPL and can also be used in order to categorize text into arbitrary topics by computing appropiate fingerprints which represent the categories.</div>

...

Cancel

The Lemur Toolkit for Language Modeling and Information Retrieval - 0 views

www.lemurproject.org

search lemur nlp opensource TextMining discovery bsd

shared by Jack Park on 11 Jan 09 - Cached

Jack Park on 11 Jan 09

The Lemur Toolkit is a open-source toolkit designed to facilitate research in language modeling and information retrieval. Lemur supports a wide range of industrial and research language applications such as ad-hoc retrieval, site-search, and text mining. The toolkit supports indexing of large-scale text databases, the construction of simple language models for documents, queries, or subcollections, and the implementation of retrieval systems based on language models as well as a variety of other retrieval models. The system is written in the C and C++ languages, and is designed as a research system to run under Unix operating systems, although it can also run under Windows.

<div class="cArrow"> </div><div class="cContentInner">The Lemur Toolkit is a open-source toolkit designed to facilitate research in language modeling and information retrieval. Lemur supports a wide range of industrial and research language applications such as ad-hoc retrieval, site-search, and text mining. The toolkit supports indexing of large-scale text databases, the construction of simple language models for documents, queries, or subcollections, and the implementation of retrieval systems based on language models as well as a variety of other retrieval models. The system is written in the C and C++ languages, and is designed as a research system to run under Unix operating systems, although it can also run under Windows. </div>

...

Cancel

UIMA COMPONENT REPOSITORY - 0 views

uima.lti.cs.cmu.edu/...Welcome.do

uima libraries TextMining knowledge discovery

shared by Jack Park on 27 Apr 09 - Cached

Jack Park on 27 Apr 09

Our goal in creating this site is to provide the basis for a thriving community of UIMA developers who can announce, discuss, design, share, and critique UIMA-compliant components, resources and solutions. The Unstructured Information Management Architecture (UIMA) is a software framework that supports rapid development and deployment of multimodal analytics - applications which provide value by processing human-readable text, audio and/or video in order to extract information, answer questions, summarize documents, etc.

<div class="cArrow"> </div><div class="cContentInner">Our goal in creating this site is to provide the basis for a thriving community of UIMA developers who can announce, discuss, design, share, and critique UIMA-compliant components, resources and solutions. The Unstructured Information Management Architecture (UIMA) is a software framework that supports rapid development and deployment of multimodal analytics - applications which provide value by processing human-readable text, audio and/or video in order to extract information, answer questions, summarize documents, etc.</div>

...

Cancel

Technology Review: Extracting Meaning from Millions of Pages - 0 views

beta.technologyreview.com/...22773

TextRunner TextMining harvesting

shared by Jack Park on 12 Jun 09 - Cached

Jack Park on 12 Jun 09

A software engine that pulls together facts by combing through more than 500 million Web pages has been developed by researchers at the University of Washington. The tool extracts information from billions of lines of text by analyzing basic relationships between words.

<div class="cArrow"> </div><div class="cContentInner">A software engine that pulls together facts by combing through more than 500 million Web pages has been developed by researchers at the University of Washington. The tool extracts information from billions of lines of text by analyzing basic relationships between words.</div>

...

Cancel

x2exp.pdf (application/pdf Object) - 0 views

www.computer.org/...x2exp.pdf

data modeling sensemaking learning TextMining

shared by Jack Park on 01 Apr 09 - No Cached

Jack Park on 01 Apr 09

But invariably, simple models and a lot of data trump more elaborate models based on less data."

<div class="cArrow"> </div><div class="cContentInner">But invariably, simple models and a lot of data trump more elaborate models based on less data." </div>

...

Cancel

The Lemur Toolkit for Language Modeling and Information Retrieval - 0 views

lemurproject.org

lemur TextMining harvesting text toolkit opensource bsd c++ java

shared by Jack Park on 27 Apr 09 - Cached

Jack Park on 27 Apr 09

The Lemur Toolkit is a open-source toolkit designed to facilitate research in language modeling and information retrieval. Lemur supports a wide range of industrial and research language applications such as ad-hoc retrieval, site-search, and text mining. The toolkit supports indexing of large-scale text databases, the construction of simple language models for documents, queries, or subcollections, and the implementation of retrieval systems based on language models as well as a variety of other retrieval models. The system is written in the C and C++ languages, and is designed as a research system to run under Unix operating systems, although it can also run under Windows.

<div class="cArrow"> </div><div class="cContentInner">The Lemur Toolkit is a open-source toolkit designed to facilitate research in language modeling and information retrieval. Lemur supports a wide range of industrial and research language applications such as ad-hoc retrieval, site-search, and text mining. The toolkit supports indexing of large-scale text databases, the construction of simple language models for documents, queries, or subcollections, and the implementation of retrieval systems based on language models as well as a variety of other retrieval models. The system is written in the C and C++ languages, and is designed as a research system to run under Unix operating systems, although it can also run under Windows. </div>

...

Cancel

Semantic API - 0 views

semanticengines.com/api.aspx

semantic searchengine soap RESTful api TextMining summary api

shared by Jack Park on 04 Apr 09 - Cached

Jack Park on 04 Apr 09

Semantic Cloud is the API that powers semantic search engine SenseBot and contextual linking tool for bloggers LinkSensor. The API supports SOAP and REST protocols (HTTP GET). The idea is to empower semantic startups or any ventures that are looking to utilize an affordable high-quality semantic solution to build their applications. Semantic API features include: * extraction of semantic concepts from a page or document; * creating a "semantic cloud" of concepts describing a group of documents; * generating a multi-document summary of a set of pages; * generating an essay on a topic based on a set of documents. Multiple parameters allow the client to control the type and format of results.

<div class="cArrow"> </div><div class="cContentInner">Semantic Cloud is the API that powers semantic search engine SenseBot and contextual linking tool for bloggers LinkSensor. The API supports SOAP and REST protocols (HTTP GET). The idea is to empower semantic startups or any ventures that are looking to utilize an affordable high-quality semantic solution to build their applications. Semantic API features include: * extraction of semantic concepts from a page or document; * creating a "semantic cloud" of concepts describing a group of documents; * generating a multi-document summary of a set of pages; * generating an essay on a topic based on a set of documents. Multiple parameters allow the client to control the type and format of results.</div>

...

Cancel

SCRIBO - Welcome to SCRIBO.ws - 0 views

www.scribo.ws/...WebHome

TextMining opensource scribo ontologies

shared by Jack Park on 18 May 09 - Cached

Jack Park on 18 May 09

SCRIBO - Semi-automatic and Collaborative Retrieval of Information Based on Ontologies - aims at algorithms and collaborative free software for the automatic extraction of knowledge from texts and images, and for the semi-automatic annotation of digital documents.

<div class="cArrow"> </div><div class="cContentInner">SCRIBO - Semi-automatic and Collaborative Retrieval of Information Based on Ontologies - aims at algorithms and collaborative free software for the automatic extraction of knowledge from texts and images, and for the semi-automatic annotation of digital documents.</div>

...

Cancel

Group items tagged

Apache UIMA - Apache UIMA - 0 views

A Unified Tagging Approach to Text Normalization - 1 views

wiki.dbpedia.org : Documentation - 0 views

SenseBot - semantic search engine that finds sense on the Web - 0 views

Alchemy - Open Source AI - 0 views

Text Analytics Solutions from ClearForest - 0 views

alphaWorks : Text Analytics Tools and Runtime for IBM LanguageWare : Overview - 0 views

uClassify - free text classifier web service - 0 views

The Stanford Wordnet Project - 0 views

TAPIR project web site - 0 views

LingPipe Home - 0 views

Java Text Categorizing Library - 0 views

The Lemur Toolkit for Language Modeling and Information Retrieval - 0 views

UIMA COMPONENT REPOSITORY - 0 views

Technology Review: Extracting Meaning from Millions of Pages - 0 views

x2exp.pdf (application/pdf Object) - 0 views

The Lemur Toolkit for Language Modeling and Information Retrieval - 0 views

Semantic API - 0 views

SCRIBO - Welcome to SCRIBO.ws - 0 views

Related searches