Skip to main content

Home/ sensemaking/ Group items tagged text

Rss Feed Group items tagged

Jack Park

A Unified Tagging Approach to Text Normalization - 1 views

  •  
    This paper addresses the issue of text normalization, an important yet often overlooked problem in natural language processing. By text normalization, we mean converting 'informally inputted' text into the canonical form, by eliminating 'noises' in the text and detecting paragraph and sentence boundaries in the text.
Jack Park

Gao - 0 views

  •  
    The purpose of this study was to improve the quality of students' online discussion of assigned readings in an online course. To improve the focus, depth, and connectedness of online discussion, the first author designed a text-focused Wiki that simultaneously displayed the assigned reading and students' comments side by side in adjacent columns. In the text-focused Wiki, students were able to read the assigned text in the left column and type their comments or questions in the right column adjacent to the sentence or passage that sparked their interest. In post-participation surveys, data were gathered about students' experiences in the text-focused Wiki and prior experiences in threaded discussion forums. Students reported more focus, depth, flow, idea generation, and enjoyment in the text-focused Wiki.
Jack Park

TagCrowd - make your own tag cloud from any text - 1 views

  •  
    TagCrowd is a web application for visualizing word frequencies in any user-supplied text by creating what is popularly known as a tag cloud or text cloud. TagCrowd is taking tag clouds far beyond their original function: * as topic summaries for speeches and written works * as blog tool or website analysis for search engine optimization (SEO) * for visual analysis of survey data * as brand clouds that let companies see how they are perceived by the world * for data mining a text corpus * for helping writers and students reflect on their work * as name tags for conferences, cocktail parties or wherever new collaborations start * as resumes in a single glance * as visual poetry
Jack Park

Ontomat Homepage - Annotation Portal - 0 views

  •  
    OntoMat-Annotizer is a user-friendly interactive webpage annotation tool. It supports the user with the task of creating and maintaining ontology-based OWL-markups i.e. creating of OWL-instances, attributes and relationships. It include an ontology browser for the exploration of the ontology and instances and a HTML browser that will display the annotated parts of the text. It is Java-based and provide a plugin interface for extensions. The intended user is the individual annotator i.e., people that want to enrich their web pages with OWL-meta data. Instead of manually annotating the page with a text editor, say, emacs, OntoMat allows the annotator to highlight relevant parts of the web page and create new instances via drag?n?drop interactions. It supports the meta-data creation phase of the lifecycle. It is planned that a future version will contain an information extraction plugin, that offers a wizard which suggest which parts of the text are relevant for annotation. That aspect will help to ease the time-consuming annotation task.
Jack Park

ecai2008_naturalowl.pdf (application/pdf Object) - 0 views

  •  
    See also: http://lists.w3.org/Archives/Public/semantic-web/2008Apr/0005.html NaturalOWL is an open-source natural language generation engine written in Java. It produces descriptions of individuals (e.g., items for sale, museum exhibits) and classes (e.g., types of exhibits) in English and Greek from OWL DL ontologies. The ontologies must have been annotated in RDF with linguistic and user modeling resources. We demonstrate a plug-in for Protege that can be used to produce these resources and to generate texts by invoking NaturalOWL. We also demonstrate how NaturalOWL can be used by robotic avatars in Second Life to describe the exhibits of virtual museums. NaturalOWL demonstrates the benefits of Natural Language Generation (NLG) on the Semantic Web. Organizations that need to publish information about objects, such as exhibits or products, can publish OWL ontologies instead of texts. NLG engines, embedded in browsers or Web servers, can then render the ontologies in multiple natural languages, whereas computer programs may access the ontologies directly.
Jack Park

alphaWorks : Text Analytics Tools and Runtime for IBM LanguageWare : Overview - 0 views

  •  
    IBM® LanguageWare® is a set of run-time libraries and an easy-to-use Eclipse-based development environment for building custom text analyzers in various languages. Deployable in Apache UIMA, these analyzers can expose the information buried in text to any application. The Eclipse-based tools makes creating analyzers simple and fast, even for non-technical users. The tools make it easy to build dictionaries, ontologies, and rules for identifying key information, relationships and meaning.
Jack Park

Java Text Categorizing Library - 0 views

  •  
    The Java Text Categorizing Library (JTCL) is a pure java 1.5 implementation of libTextCat which in turn is "a library that was primarily developed for language guessing, a task on which it is known to perform with near-perfect accuracy". It's distributed under the LGPL and can also be used in order to categorize text into arbitrary topics by computing appropiate fingerprints which represent the categories.
Jack Park

The Lemur Toolkit for Language Modeling and Information Retrieval - 0 views

  •  
    The Lemur Toolkit is a open-source toolkit designed to facilitate research in language modeling and information retrieval. Lemur supports a wide range of industrial and research language applications such as ad-hoc retrieval, site-search, and text mining. The toolkit supports indexing of large-scale text databases, the construction of simple language models for documents, queries, or subcollections, and the implementation of retrieval systems based on language models as well as a variety of other retrieval models. The system is written in the C and C++ languages, and is designed as a research system to run under Unix operating systems, although it can also run under Windows.
Jack Park

A Framework for Web Science - ECS EPrints Repository - 0 views

  •  
    This text sets out a series of approaches to the analysis and synthesis of the World Wide Web, and other web-like information structures. A comprehensive set of research questions is outlined, together with a sub-disciplinary breakdown, emphasising the multi-faceted nature of the Web, and the multi-disciplinary nature of its study and development. These questions and approaches together set out an agenda for Web Science, the science of decentralised information systems. Web Science is required both as a way to understand the Web, and as a way to focus its development on key communicational and representational requirements. The text surveys central engineering issues, such as the development of the Semantic Web, Web services and P2P. Analytic approaches to discover the Web's topology, or its graph-like structures, are examined. Finally, the Web as a technology is essentially socially embedded; therefore various issues and requirements for Web use and governance are also reviewed.
Jack Park

Topic Mapping The Restoration - 0 views

  •  
    This article describes the motivation for and development of a project I have called PepysMap. PepysMap was inspired by the excellent 'blog of the diary of Samuel Pepys run by Phil Gyford 1. Phil posts diary entries day by day (currently for the year 1662). Each blog post contains the text of the diary entry hyperlinked to pages containing detail of people, places and cultural artifacts referenced from the text. The goal of PepysMap is to shadow the development of the Pepys blog by creating a topic map for each diary entry, showing the relationships between people, places and cultural artifacts.
Jack Park

EtherPad: Realtime Collaborative Text Editing - 0 views

  •  
    The perfect way to collaborate on a text document and keep everyone literally on the same page.
Jack Park

TAPIR project web site - 0 views

  •  
    TAPIR started up as a research project in June 2001. In 2002 the project is sponsored by NORDINFO and the Research Council of The Danish Ministry of Culture. TAPIR aims at investigating the potentials of applying the diversity of cognitive representations pointing to scientific full-text documents following the principle of poly-representation. Poly-representation (or multi evidence) implies to utilize the cognitively different overlapping interpretations, also over time, made by different actors participating in interactive IR. Such cognitive overlaps derive, for instance, from the authors own perceptions of their work (titles, full-text terms), from human indexing (e.g. descriptors), or from citations given to the work by other authors. The assumption is that the more cognitively different the representations simultaneously pointing to a document are, the higher is the probability that the document is relevant to a given set of criteria.
Jack Park

The Lemur Toolkit for Language Modeling and Information Retrieval - 0 views

  •  
    The Lemur Toolkit is a open-source toolkit designed to facilitate research in language modeling and information retrieval. Lemur supports a wide range of industrial and research language applications such as ad-hoc retrieval, site-search, and text mining. The toolkit supports indexing of large-scale text databases, the construction of simple language models for documents, queries, or subcollections, and the implementation of retrieval systems based on language models as well as a variety of other retrieval models. The system is written in the C and C++ languages, and is designed as a research system to run under Unix operating systems, although it can also run under Windows.
Jack Park

SpaceCollective Projects - The Total Library - 0 views

  •  
    Text that redefines - or - How to redefine the text.
Jack Park

The Future of Reputation - 0 views

  •  
    The full text of The Future of Reputation is now available online for free. Click on the links below to download PDFs of each chapter. The front matter to the book is at the beginning of each chapter.
Jack Park

Sphinx - Free open-source SQL full-text search engine - 0 views

  •  
    Sphinx is a full-text search engine, distributed under GPL version 2. Commercial license is also available for embedded use. Generally, it's a standalone search engine, meant to provide fast, size-efficient and relevant fulltext search functions to other applications. Sphinx was specially designed to integrate well with SQL databases and scripting languages. Currently built-in data sources support fetching data either via direct connection to MySQL or PostgreSQL, or using XML pipe mechanism (a pipe to indexer in special XML-based format which Sphinx recognizes).
Jack Park

AKTive Media ontology based annotation system - 0 views

  •  
    AKTive Media is an ontology based cross-media annotation (Images and Text) system. Our goal is to automate the process of annotation by suggesting knowledge to the user in an interactive way while the user is annotating and hence minimizing user effort. The system actively works in the background, interacting with web services and queries our central annotational store to look for context specific knowledge.
Jack Park

Using Semantic Word Classes in Text Information Retrieval Systems (ResearchIndex) - 0 views

  •  
    In this paper an application of methodologies to automatically acquire semantic word classes and to use them in text information retrieval systems is described.
Jack Park

uClassify - free text classifier web service - 0 views

  •  
    uClassify is a free web service where you can easily create your own text classifiers.
Jack Park

GATE, A General Architecture for Text Engineering - 0 views

  •  
    GATE is... * the Eclipse of Natural Language Engineering, the Lucene of Information Extraction, a leading toolkit for Text Mining * used worldwide by thousands of scientists, companies, teachers and students * comprised of an architecture, a free open source framework (or SDK) and graphical development environment * used for all sorts of language processing tasks, including Information Extraction in many languages * funded by the EPSRC, BBSRC, AHRC, the EU and commercial users * 100% Java reference implementation of ISO TC37/SC4 and used with XCES in the ANC * 10 years old in 2005, used in many research projects and compatible with IBM's UIMA * based on MVC, mobile code, continuous integration, and test-driven development, with code hosted on SourceForge
1 - 20 of 56 Next › Last »
Showing 20 items per page