Skip to main content

Home/ MHC Languages/ Group items tagged corpora

Rss Feed Group items tagged

LRC MHC

UAM CorpusTool Homepage - 0 views

  •  
    "The UAM CorpusTool is a state-of-the-art environment for annotation of text corpora. So, whether you are annotating a corpus as part of a linguistic study, or building a training set for use in statistical language processing, this is the tool for you."
Daryl Beres

Google as a Quick 'n Dirty Corpus Tool - 0 views

  •  
    "Until recently it was assumed that specialized software was required to do concordancing, but it turns out that a search engine such as Google can generate queries into almost limitless corpora (using the Advanced Search feature from the main portal page, for example). This paper by Tom Robb addresses more refined issues regarding the integrity of the data thus derived, and how we might improve on the integrity of that data through more defined searches, as explained here. "
Daryl Beres

Alfred Lord Tennyson: Mariana - 0 views

  •  
    Sample activity for literature analysis use concordance of a poem
Daryl Beres

Tennyson: MARIANA, The concordance. - 0 views

  •  
    Sample activity using concordance for literature analysis with a poem by Alfred Lord Tennyson.
LRC MHC

غلطاوي:التصحيح التلقائي العربي Ghalatawi:Arabic AutoCorrect - 0 views

  •  
    Tools for Arabic study, including spell-checker, autocorrect, corpora, word tagger, morphological analyzers, dictionaries, games, etc.
Daryl Beres

[oucs] All About Xaira - 1 views

  •  
    "Xaira is a text searching software originally developed at OUCS for use with the British National Corpus. This new version has been entirely re-written as a general purpose XML search engine, which will operate on any corpus of well-formed XML documents. It is however best used with TEI-conformant documents.
    Xaira has full Unicode support. This means you can use it to search and display text in any language, provided you have a suitable Unicode font installed on your system.
    At the heart of Xaira is the Xaira Object Model. This defines a range of objects and methods for representing and searching large amounts of linguistic data. The Xaira Server program implements this model. The Xaira Indexer program creates platform-independent indexes from collections of XML documents for use by the Server. Both these Xaira components can be deployed on any platform.
    Client programs can access a Xaira server using a close-coupled API such as that used by the Windows client (which is written in C++), or via XMLRPC or SOAP. We provide a fully-featured client for Windows, and a PHP code library which makes it easy to develop applications for the web which can talk to a Xaira server.
    All versions of Xaira are now distributed free of charge under the GNU General Public Licence."
LRC MHC

Using BNC XML for English language study - 0 views

  •  
    This page contains some examples of how the BNC (XML edition) can be used in combination with non-corpus based activities and exercises to study the English language. The exercises are intended as illustrations of what can be done with access to the corpus. Refer to the sample search results provided or perform your own searches.
LRC MHC

[OTA] The Chambers-Rostand Corpus of Journalistic French [Electronic resource] - 0 views

  •  
    The Chambers-Rostand Corpus of Journalistic French [Electronic resource] (Le Corpus Chambers-Rostand du français journalistique) This resource is freely available, you should be able to download it now.
LRC MHC

Sketch Engine - 0 views

  •  
    The Sketch Engine (SkE, also known as Word Sketch Engine) is a Corpus Query System incorporating word sketches, grammatical relations, and a distributional thesaurus. A word sketch is a one-page, automatic, corpus-derived summary of a word's grammatical and collocational behaviour.
LRC MHC

Language and text - Hans j. Klarskov Mortensen. - 1 views

  •  
    Language, teaching & text software - TOOLS:
    * PhraseContext:Text analysis tool, writing tool, collocation analysis, concordancing, text and XML output, and much more.
    * Calculator Calculate T-score, Z-score and Mutual Information. (Free)
    * Simpel Grammatik - grammar teaching software (only in Danish) (Free)
    * Convert : Extract text from PDF-files (Free)
    * Tokeniser - a small freeware utility. (Free)
    * Some Object Pascal/Delphi string routines
LRC MHC

ItalNet * OVI Database Home - 0 views

  •  
    "Welcome to the ItalNet publication of the Opera del Vocabolario Italiano (OVI) textual database. The production database contains 1849 vernacular texts (21.2 million words, 479,000 unique forms) the majority of which are dated prior to 1375, the year of Boccaccio's death. The beta-test installation of the database under PhiloLogic3 contains 1960 documents (see below). The verse and prose works include early masters of Italian literature like Dante, Petrarch, and Boccaccio, as well as lesser-known and obscure texts by poets, merchants, and medieval chroniclers. The OVI database was created to aid in the compilation of an historical dictionary of the Italian language, the Tesoro della lingua italiana delle origini, (portions of which are now available online). The fully-searchable ItalNet implementation of the OVI database presented here has been produced in order to enable scholars around the world to benefit from this rich textual resource. "
LRC MHC

Open Language Archives Community - 0 views

  •  
    "OLAC, the Open Language Archives Community, is an international partnership of institutions and individuals who are creating a worldwide virtual library of language resources by: (i) developing consensus on best current practice for the digital archiving of language resources, and (ii) developing a network of interoperating repositories and services for housing and accessing such resources. "
LRC MHC

Mannheim Corpus: Cyril Belica: Kookkurrenzdatenbank CCDB. Eine korpuslinguist... - 0 views

  •  
    Mannheim Corpus: A very big - and free - corpus of German texts maintained by the Institut für Deutsche Sprache, including a choice of corpora and a lot of search facilities:
LRC MHC

Logos Library - Logos Translations multilingual library - 0 views

  •  
    Welcome to the Logos context search facility LOGOS LIBRARY. The LOGOS LIBRARY is a powerful interface with a massive database (currently 707.737.941 words) containing multilingual novels, technical literature and translated texts. Hits are highlighted in context windows that can be expanded up or down. To go to the source web pages (novels, etc.) click on the title - to run a dictionary search click on the highlighted word or phrase.
LRC MHC

WebAsCorpus.org - find Web Concordances - 0 views

  •  
    Search the Web directly for concordances of words and phrases in 34 different languages. This new release (last update: 24 May 2010) adds support for selecting which documents to include in the zipfile, preselection based on document metrics, combining all textfiles into a single document for importing into kfNgram or a concordancer, and conversion from UTF-8 into more widely-supported encodings.
LRC MHC

ABU - Recherche d'occurrences - 1 views

  •  
    ABU : la Bibliothèque Universelle: L'accès libre au texte intégral d'oeuvres du domaine public francophone sur Internet depuis 1993.
    Allow search of full-text.
LRC MHC

WebCorp: The Web as Corpus - 0 views

  •  
    WebCorp - uses the whole internet as a corpus. Can limit by domain, language, etc.
LRC MHC

[bnc] British National Corpus - 0 views

  •  
    The British National Corpus (BNC) is a 100 million word collection of samples of written and spoken language from a wide range of sources, designed to represent a wide cross-section of current British English, both spoken and written.
    Type a word or phrase in the search box and press the Return key on your keyboard to see up to 50 random hits from the corpus.
LRC MHC

Corpus del Español - 1 views

  •  
    Corpus del Español: A 100-million word diachronic corpus of Spanish texts, created by Mark Davies of Brigham Young University
LRC MHC

Moteur de recherche - Corpus Lexicaux Québécois - 0 views

  •  
    Corpus Lexicaux Québécois: Canadian French corpora with search facilities
1 - 20 of 50 Next › Last »
Showing 20 items per page