Group items tagged text - sensemaking

A Unified Tagging Approach to Text Normalization - 1 views

keg.cs.tsinghua.edu.cn/...u-et-al-Text-Normalization.pdf

conditional-random-field nlp paper pdf tagging textmining

shared by Jack Park on 23 Sep 08 - No Cached

Jack Park on 23 Sep 08

This paper addresses the issue of text normalization, an important yet often overlooked problem in natural language processing. By text normalization, we mean converting 'informally inputted' text into the canonical form, by eliminating 'noises' in the text and detecting paragraph and sentence boundaries in the text.

<div class="cArrow"> </div><div class="cContentInner">This paper addresses the issue of text normalization, an important yet often overlooked problem in natural language processing. By text normalization, we mean converting 'informally inputted' text into the canonical form, by eliminating 'noises' in the text and detecting paragraph and sentence boundaries in the text.</div>

...

Cancel

Gao - 0 views

www.uic.edu/...1921

education wiki research article learning wikis journal discussion sensemaking

shared by Jack Park on 31 Dec 08 - Cached

Jack Park on 31 Dec 08

The purpose of this study was to improve the quality of students' online discussion of assigned readings in an online course. To improve the focus, depth, and connectedness of online discussion, the first author designed a text-focused Wiki that simultaneously displayed the assigned reading and students' comments side by side in adjacent columns. In the text-focused Wiki, students were able to read the assigned text in the left column and type their comments or questions in the right column adjacent to the sentence or passage that sparked their interest. In post-participation surveys, data were gathered about students' experiences in the text-focused Wiki and prior experiences in threaded discussion forums. Students reported more focus, depth, flow, idea generation, and enjoyment in the text-focused Wiki.

<div class="cArrow"> </div><div class="cContentInner">The purpose of this study was to improve the quality of students' online discussion of assigned readings in an online course. To improve the focus, depth, and connectedness of online discussion, the first author designed a text-focused Wiki that simultaneously displayed the assigned reading and students' comments side by side in adjacent columns. In the text-focused Wiki, students were able to read the assigned text in the left column and type their comments or questions in the right column adjacent to the sentence or passage that sparked their interest. In post-participation surveys, data were gathered about students' experiences in the text-focused Wiki and prior experiences in threaded discussion forums. Students reported more focus, depth, flow, idea generation, and enjoyment in the text-focused Wiki.</div>

...

Cancel

TagCrowd - make your own tag cloud from any text - 1 views

www.tagcrowd.com

tagging web2.0 tags tools visualization tagcloud tag cloud sensemaking

shared by Jack Park on 22 Dec 08 - Cached

Jack Park on 22 Dec 08

TagCrowd is a web application for visualizing word frequencies in any user-supplied text by creating what is popularly known as a tag cloud or text cloud. TagCrowd is taking tag clouds far beyond their original function: * as topic summaries for speeches and written works * as blog tool or website analysis for search engine optimization (SEO) * for visual analysis of survey data * as brand clouds that let companies see how they are perceived by the world * for data mining a text corpus * for helping writers and students reflect on their work * as name tags for conferences, cocktail parties or wherever new collaborations start * as resumes in a single glance * as visual poetry

<div class="cArrow"> </div><div class="cContentInner">TagCrowd is a web application for visualizing word frequencies in any user-supplied text by creating what is popularly known as a tag cloud or text cloud. TagCrowd is taking tag clouds far beyond their original function: * as topic summaries for speeches and written works * as blog tool or website analysis for search engine optimization (SEO) * for visual analysis of survey data * as brand clouds that let companies see how they are perceived by the world * for data mining a text corpus * for helping writers and students reflect on their work * as name tags for conferences, cocktail parties or wherever new collaborations start * as resumes in a single glance * as visual poetry </div>

...

Cancel

Ontomat Homepage - Annotation Portal - 0 views

annotation.semanticweb.org/...index.html

ontomat annotation opensource java owl ontology

shared by Jack Park on 30 Nov 08 - Cached

Jack Park on 30 Nov 08

OntoMat-Annotizer is a user-friendly interactive webpage annotation tool. It supports the user with the task of creating and maintaining ontology-based OWL-markups i.e. creating of OWL-instances, attributes and relationships. It include an ontology browser for the exploration of the ontology and instances and a HTML browser that will display the annotated parts of the text. It is Java-based and provide a plugin interface for extensions. The intended user is the individual annotator i.e., people that want to enrich their web pages with OWL-meta data. Instead of manually annotating the page with a text editor, say, emacs, OntoMat allows the annotator to highlight relevant parts of the web page and create new instances via drag?n?drop interactions. It supports the meta-data creation phase of the lifecycle. It is planned that a future version will contain an information extraction plugin, that offers a wizard which suggest which parts of the text are relevant for annotation. That aspect will help to ease the time-consuming annotation task.

<div class="cArrow"> </div><div class="cContentInner">OntoMat-Annotizer is a user-friendly interactive webpage annotation tool. It supports the user with the task of creating and maintaining ontology-based OWL-markups i.e. creating of OWL-instances, attributes and relationships. It include an ontology browser for the exploration of the ontology and instances and a HTML browser that will display the annotated parts of the text. It is Java-based and provide a plugin interface for extensions. The intended user is the individual annotator i.e., people that want to enrich their web pages with OWL-meta data. Instead of manually annotating the page with a text editor, say, emacs, OntoMat allows the annotator to highlight relevant parts of the web page and create new instances via drag?n?drop interactions. It supports the meta-data creation phase of the lifecycle. It is planned that a future version will contain an information extraction plugin, that offers a wizard which suggest which parts of the text are relevant for annotation. That aspect will help to ease the time-consuming annotation task. </div>

...

Cancel

ecai2008_naturalowl.pdf (application/pdf Object) - 0 views

www.aueb.gr/...ecai2008_naturalowl.pdf

ecai2008_naturalowl generation ontologies opensource owl text

shared by Jack Park on 29 Aug 08 - No Cached

Jack Park on 29 Aug 08

See also: http://lists.w3.org/Archives/Public/semantic-web/2008Apr/0005.html NaturalOWL is an open-source natural language generation engine written in Java. It produces descriptions of individuals (e.g., items for sale, museum exhibits) and classes (e.g., types of exhibits) in English and Greek from OWL DL ontologies. The ontologies must have been annotated in RDF with linguistic and user modeling resources. We demonstrate a plug-in for Protege that can be used to produce these resources and to generate texts by invoking NaturalOWL. We also demonstrate how NaturalOWL can be used by robotic avatars in Second Life to describe the exhibits of virtual museums. NaturalOWL demonstrates the benefits of Natural Language Generation (NLG) on the Semantic Web. Organizations that need to publish information about objects, such as exhibits or products, can publish OWL ontologies instead of texts. NLG engines, embedded in browsers or Web servers, can then render the ontologies in multiple natural languages, whereas computer programs may access the ontologies directly.

<div class="cArrow"> </div><div class="cContentInner">See also: <a href="http://lists.w3.org/Archives/Public/semantic-web/2008Apr/0005.html" rel="nofollow" target="_blank">http://lists.w3.org/Archives/Public/semantic-web/2008Apr/0005.html</a> NaturalOWL is an open-source natural language generation engine written in Java. It produces descriptions of individuals (e.g., items for sale, museum exhibits) and classes (e.g., types of exhibits) in English and Greek from OWL DL ontologies. The ontologies must have been annotated in RDF with linguistic and user modeling resources. We demonstrate a plug-in for Protege that can be used to produce these resources and to generate texts by invoking NaturalOWL. We also demonstrate how NaturalOWL can be used by robotic avatars in Second Life to describe the exhibits of virtual museums. NaturalOWL demonstrates the benefits of Natural Language Generation (NLG) on the Semantic Web. Organizations that need to publish information about objects, such as exhibits or products, can publish OWL ontologies instead of texts. NLG engines, embedded in browsers or Web servers, can then render the ontologies in multiple natural languages, whereas computer programs may access the ontologies directly.</div>

...

Cancel

alphaWorks : Text Analytics Tools and Runtime for IBM LanguageWare : Overview - 0 views

www.alphaworks.ibm.com/lrw

languageware ibm TextMining uima harvesting discovery

shared by Jack Park on 18 Nov 08 - Cached

Jack Park on 18 Nov 08

IBM® LanguageWare® is a set of run-time libraries and an easy-to-use Eclipse-based development environment for building custom text analyzers in various languages. Deployable in Apache UIMA, these analyzers can expose the information buried in text to any application. The Eclipse-based tools makes creating analyzers simple and fast, even for non-technical users. The tools make it easy to build dictionaries, ontologies, and rules for identifying key information, relationships and meaning.

<div class="cArrow"> </div><div class="cContentInner">IBM® LanguageWare® is a set of run-time libraries and an easy-to-use Eclipse-based development environment for building custom text analyzers in various languages. Deployable in Apache UIMA, these analyzers can expose the information buried in text to any application. The Eclipse-based tools makes creating analyzers simple and fast, even for non-technical users. The tools make it easy to build dictionaries, ontologies, and rules for identifying key information, relationships and meaning. </div>

...

Cancel

Java Text Categorizing Library - 0 views

textcat.sourceforge.net

categorizing TextMining information extraction harvesting library opensource java lgpl

shared by Jack Park on 02 Dec 08 - Cached

Jack Park on 02 Dec 08

The Java Text Categorizing Library (JTCL) is a pure java 1.5 implementation of libTextCat which in turn is "a library that was primarily developed for language guessing, a task on which it is known to perform with near-perfect accuracy". It's distributed under the LGPL and can also be used in order to categorize text into arbitrary topics by computing appropiate fingerprints which represent the categories.

<div class="cArrow"> </div><div class="cContentInner">The Java Text Categorizing Library (JTCL) is a pure java 1.5 implementation of libTextCat which in turn is "a library that was primarily developed for language guessing, a task on which it is known to perform with near-perfect accuracy". It's distributed under the LGPL and can also be used in order to categorize text into arbitrary topics by computing appropiate fingerprints which represent the categories.</div>

...

Cancel

The Lemur Toolkit for Language Modeling and Information Retrieval - 0 views

lemurproject.org

lemur TextMining harvesting text toolkit opensource bsd c++ java

shared by Jack Park on 27 Apr 09 - Cached

Jack Park on 27 Apr 09

The Lemur Toolkit is a open-source toolkit designed to facilitate research in language modeling and information retrieval. Lemur supports a wide range of industrial and research language applications such as ad-hoc retrieval, site-search, and text mining. The toolkit supports indexing of large-scale text databases, the construction of simple language models for documents, queries, or subcollections, and the implementation of retrieval systems based on language models as well as a variety of other retrieval models. The system is written in the C and C++ languages, and is designed as a research system to run under Unix operating systems, although it can also run under Windows.

<div class="cArrow"> </div><div class="cContentInner">The Lemur Toolkit is a open-source toolkit designed to facilitate research in language modeling and information retrieval. Lemur supports a wide range of industrial and research language applications such as ad-hoc retrieval, site-search, and text mining. The toolkit supports indexing of large-scale text databases, the construction of simple language models for documents, queries, or subcollections, and the implementation of retrieval systems based on language models as well as a variety of other retrieval models. The system is written in the C and C++ languages, and is designed as a research system to run under Unix operating systems, although it can also run under Windows. </div>

...

Cancel

A Framework for Web Science - ECS EPrints Repository - 0 views

eprints.ecs.soton.ac.uk/13347

ecs framework paper webscience

shared by Jack Park on 31 Aug 08 - Cached

Jack Park on 31 Aug 08

This text sets out a series of approaches to the analysis and synthesis of the World Wide Web, and other web-like information structures. A comprehensive set of research questions is outlined, together with a sub-disciplinary breakdown, emphasising the multi-faceted nature of the Web, and the multi-disciplinary nature of its study and development. These questions and approaches together set out an agenda for Web Science, the science of decentralised information systems. Web Science is required both as a way to understand the Web, and as a way to focus its development on key communicational and representational requirements. The text surveys central engineering issues, such as the development of the Semantic Web, Web services and P2P. Analytic approaches to discover the Web's topology, or its graph-like structures, are examined. Finally, the Web as a technology is essentially socially embedded; therefore various issues and requirements for Web use and governance are also reviewed.

<div class="cArrow"> </div><div class="cContentInner">This text sets out a series of approaches to the analysis and synthesis of the World Wide Web, and other web-like information structures. A comprehensive set of research questions is outlined, together with a sub-disciplinary breakdown, emphasising the multi-faceted nature of the Web, and the multi-disciplinary nature of its study and development. These questions and approaches together set out an agenda for Web Science, the science of decentralised information systems. Web Science is required both as a way to understand the Web, and as a way to focus its development on key communicational and representational requirements. The text surveys central engineering issues, such as the development of the Semantic Web, Web services and P2P. Analytic approaches to discover the Web's topology, or its graph-like structures, are examined. Finally, the Web as a technology is essentially socially embedded; therefore various issues and requirements for Web use and governance are also reviewed.</div>

...

Cancel

Topic Mapping The Restoration - 0 views

www.idealliance.org/...03-08-02

heml opendata restoration topic map

shared by Jack Park on 01 Jul 08 - Cached

Jack Park on 01 Jul 08

This article describes the motivation for and development of a project I have called PepysMap. PepysMap was inspired by the excellent 'blog of the diary of Samuel Pepys run by Phil Gyford 1. Phil posts diary entries day by day (currently for the year 1662). Each blog post contains the text of the diary entry hyperlinked to pages containing detail of people, places and cultural artifacts referenced from the text. The goal of PepysMap is to shadow the development of the Pepys blog by creating a topic map for each diary entry, showing the relationships between people, places and cultural artifacts.

<div class="cArrow"> </div><div class="cContentInner">This article describes the motivation for and development of a project I have called PepysMap. PepysMap was inspired by the excellent 'blog of the diary of Samuel Pepys run by Phil Gyford 1. Phil posts diary entries day by day (currently for the year 1662). Each blog post contains the text of the diary entry hyperlinked to pages containing detail of people, places and cultural artifacts referenced from the text. The goal of PepysMap is to shadow the development of the Pepys blog by creating a topic map for each diary entry, showing the relationships between people, places and cultural artifacts. </div>

...

Cancel

EtherPad: Realtime Collaborative Text Editing - 0 views

etherpad.com

collaboration writing web2.0 text tools etherpad tool

shared by Jack Park on 02 Jan 09 - Cached

Jack Park on 02 Jan 09

The perfect way to collaborate on a text document and keep everyone literally on the same page.

<div class="cArrow"> </div><div class="cContentInner">The perfect way to collaborate on a text document and keep everyone literally on the same page.</div>

...

Cancel

TAPIR project web site - 0 views

project.dbit.dk/tapir

tapir information extraction discovery TextMining

shared by Jack Park on 31 Dec 08 - Cached

Jack Park on 31 Dec 08

TAPIR started up as a research project in June 2001. In 2002 the project is sponsored by NORDINFO and the Research Council of The Danish Ministry of Culture. TAPIR aims at investigating the potentials of applying the diversity of cognitive representations pointing to scientific full-text documents following the principle of poly-representation. Poly-representation (or multi evidence) implies to utilize the cognitively different overlapping interpretations, also over time, made by different actors participating in interactive IR. Such cognitive overlaps derive, for instance, from the authors own perceptions of their work (titles, full-text terms), from human indexing (e.g. descriptors), or from citations given to the work by other authors. The assumption is that the more cognitively different the representations simultaneously pointing to a document are, the higher is the probability that the document is relevant to a given set of criteria.

<div class="cArrow"> </div><div class="cContentInner">TAPIR started up as a research project in June 2001. In 2002 the project is sponsored by NORDINFO and the Research Council of The Danish Ministry of Culture. TAPIR aims at investigating the potentials of applying the diversity of cognitive representations pointing to scientific full-text documents following the principle of poly-representation. Poly-representation (or multi evidence) implies to utilize the cognitively different overlapping interpretations, also over time, made by different actors participating in interactive IR. Such cognitive overlaps derive, for instance, from the authors own perceptions of their work (titles, full-text terms), from human indexing (e.g. descriptors), or from citations given to the work by other authors. The assumption is that the more cognitively different the representations simultaneously pointing to a document are, the higher is the probability that the document is relevant to a given set of criteria.</div>

...

Cancel

The Lemur Toolkit for Language Modeling and Information Retrieval - 0 views

www.lemurproject.org

search lemur nlp opensource TextMining discovery bsd

shared by Jack Park on 11 Jan 09 - Cached

Jack Park on 11 Jan 09

The Lemur Toolkit is a open-source toolkit designed to facilitate research in language modeling and information retrieval. Lemur supports a wide range of industrial and research language applications such as ad-hoc retrieval, site-search, and text mining. The toolkit supports indexing of large-scale text databases, the construction of simple language models for documents, queries, or subcollections, and the implementation of retrieval systems based on language models as well as a variety of other retrieval models. The system is written in the C and C++ languages, and is designed as a research system to run under Unix operating systems, although it can also run under Windows.

<div class="cArrow"> </div><div class="cContentInner">The Lemur Toolkit is a open-source toolkit designed to facilitate research in language modeling and information retrieval. Lemur supports a wide range of industrial and research language applications such as ad-hoc retrieval, site-search, and text mining. The toolkit supports indexing of large-scale text databases, the construction of simple language models for documents, queries, or subcollections, and the implementation of retrieval systems based on language models as well as a variety of other retrieval models. The system is written in the C and C++ languages, and is designed as a research system to run under Unix operating systems, although it can also run under Windows. </div>

...

Cancel

SpaceCollective Projects - The Total Library - 0 views

spacecollective.org/...The-Total-Library

libraries library blogs sensemaking

shared by Jack Park on 18 May 09 - Cached

Jack Park on 18 May 09

Text that redefines - or - How to redefine the text.

<div class="cArrow"> </div><div class="cContentInner">Text that redefines - or - How to redefine the text.</div>

...

Cancel

The Future of Reputation - 0 views

docs.law.gwu.edu/...text.htm

book books ebook ebooks free privacy reputation research web

shared by Jack Park on 27 Sep 08 - Cached

Jack Park on 27 Sep 08

The full text of The Future of Reputation is now available online for free. Click on the links below to download PDFs of each chapter. The front matter to the book is at the beginning of each chapter.

<div class="cArrow"> </div><div class="cContentInner">The full text of The Future of Reputation is now available online for free. Click on the links below to download PDFs of each chapter. The front matter to the book is at the beginning of each chapter.</div>

...

Cancel

Sphinx - Free open-source SQL full-text search engine - 0 views

www.sphinxsearch.com

search mysql database sphinx sql opensource php indexing searchengine gpl

shared by Jack Park on 21 Dec 08 - Cached

Jack Park on 21 Dec 08

Sphinx is a full-text search engine, distributed under GPL version 2. Commercial license is also available for embedded use. Generally, it's a standalone search engine, meant to provide fast, size-efficient and relevant fulltext search functions to other applications. Sphinx was specially designed to integrate well with SQL databases and scripting languages. Currently built-in data sources support fetching data either via direct connection to MySQL or PostgreSQL, or using XML pipe mechanism (a pipe to indexer in special XML-based format which Sphinx recognizes).

<div class="cArrow"> </div><div class="cContentInner">Sphinx is a full-text search engine, distributed under GPL version 2. Commercial license is also available for embedded use. Generally, it's a standalone search engine, meant to provide fast, size-efficient and relevant fulltext search functions to other applications. Sphinx was specially designed to integrate well with SQL databases and scripting languages. Currently built-in data sources support fetching data either via direct connection to MySQL or PostgreSQL, or using XML pipe mechanism (a pipe to indexer in special XML-based format which Sphinx recognizes). </div>

...

Cancel

AKTive Media ontology based annotation system - 0 views

www.dcs.shef.ac.uk/...cresearch.html

aktive annotation opensource text images

shared by Jack Park on 30 Nov 08 - Cached

Jack Park on 30 Nov 08

AKTive Media is an ontology based cross-media annotation (Images and Text) system. Our goal is to automate the process of annotation by suggesting knowledge to the user in an interactive way while the user is annotating and hence minimizing user effort. The system actively works in the background, interacting with web services and queries our central annotational store to look for context specific knowledge.

<div class="cArrow"> </div><div class="cContentInner">AKTive Media is an ontology based cross-media annotation (Images and Text) system. Our goal is to automate the process of annotation by suggesting knowledge to the user in an interactive way while the user is annotating and hence minimizing user effort. The system actively works in the background, interacting with web services and queries our central annotational store to look for context specific knowledge.</div>

...

Cancel

Using Semantic Word Classes in Text Information Retrieval Systems (ResearchIndex) - 0 views

citeseer.ist.psu.edu/...726903.html

citeseer information extraction semantic map

shared by Jack Park on 30 Nov 08 - Cached

Jack Park on 30 Nov 08

In this paper an application of methodologies to automatically acquire semantic word classes and to use them in text information retrieval systems is described.

<div class="cArrow"> </div><div class="cContentInner">In this paper an application of methodologies to automatically acquire semantic word classes and to use them in text information retrieval systems is described. </div>

...

Cancel

uClassify - free text classifier web service - 0 views

uclassify.com

uclassify TextMining classification

shared by Jack Park on 16 Dec 08 - Cached

Jack Park on 16 Dec 08

uClassify is a free web service where you can easily create your own text classifiers.

<div class="cArrow"> </div><div class="cContentInner">uClassify is a free web service where you can easily create your own text classifiers.</div>

...

Cancel

GATE, A General Architecture for Text Engineering - 0 views

gate.ac.uk

java nlp opensource ai gate software information_extraction language annotation

shared by Jack Park on 30 Nov 08 - Cached

Jack Park on 30 Nov 08

GATE is... * the Eclipse of Natural Language Engineering, the Lucene of Information Extraction, a leading toolkit for Text Mining * used worldwide by thousands of scientists, companies, teachers and students * comprised of an architecture, a free open source framework (or SDK) and graphical development environment * used for all sorts of language processing tasks, including Information Extraction in many languages * funded by the EPSRC, BBSRC, AHRC, the EU and commercial users * 100% Java reference implementation of ISO TC37/SC4 and used with XCES in the ANC * 10 years old in 2005, used in many research projects and compatible with IBM's UIMA * based on MVC, mobile code, continuous integration, and test-driven development, with code hosted on SourceForge

<div class="cArrow"> </div><div class="cContentInner">GATE is... * the Eclipse of Natural Language Engineering, the Lucene of Information Extraction, a leading toolkit for Text Mining * used worldwide by thousands of scientists, companies, teachers and students * comprised of an architecture, a free open source framework (or SDK) and graphical development environment * used for all sorts of language processing tasks, including Information Extraction in many languages * funded by the EPSRC, BBSRC, AHRC, the EU and commercial users * 100% Java reference implementation of ISO TC37/SC4 and used with XCES in the ANC * 10 years old in 2005, used in many research projects and compatible with IBM's UIMA * based on MVC, mobile code, continuous integration, and test-driven development, with code hosted on SourceForge</div>

...

Cancel

Group items tagged

A Unified Tagging Approach to Text Normalization - 1 views

Gao - 0 views

TagCrowd - make your own tag cloud from any text - 1 views

Ontomat Homepage - Annotation Portal - 0 views

ecai2008_naturalowl.pdf (application/pdf Object) - 0 views

alphaWorks : Text Analytics Tools and Runtime for IBM LanguageWare : Overview - 0 views

Java Text Categorizing Library - 0 views

The Lemur Toolkit for Language Modeling and Information Retrieval - 0 views

A Framework for Web Science - ECS EPrints Repository - 0 views

Topic Mapping The Restoration - 0 views

EtherPad: Realtime Collaborative Text Editing - 0 views

TAPIR project web site - 0 views

The Lemur Toolkit for Language Modeling and Information Retrieval - 0 views

SpaceCollective Projects - The Total Library - 0 views

The Future of Reputation - 0 views

Sphinx - Free open-source SQL full-text search engine - 0 views

AKTive Media ontology based annotation system - 0 views

Using Semantic Word Classes in Text Information Retrieval Systems (ResearchIndex) - 0 views

uClassify - free text classifier web service - 0 views

GATE, A General Architecture for Text Engineering - 0 views

Related searches