Skip to main content

Home/ MHC Languages/ Group items tagged corpus

Rss Feed Group items tagged

LRC MHC

Corpora for Language Learning and Teaching, Tools & Websites - 1 views

  •  
    This page offers information about some common corpus tools and links to resources on the web: * Online search in corpora * Online full-text search in books * Text and media archives * Online text/corpus analysis tools * Offline text/corpus analysis tools (concordancers) * Further resources * Corpus linguistics websites
LRC MHC

Loyola Computer-Mediated Communication Corpus - 0 views

  •  
    This site provides access to a corpus of over 900 text samples gathered from test subjects at Loyola College, Baltimore, Maryland, in 2006 and 2007. Twenty-one subjects provide a completely correlated corpus in which each subject provided their opinion in each of six predetermined topics in each of six genres: blog, chat, discussion, email, essay, and interview. We hope this corpus will be useful to researchers in the fields of natural language processing and computational linguistics.
LRC MHC

Michigan Corpus Linguistics Home >> ELI Corpora & UM ACL - 0 views

  •  
    The Michigan Corpus Linguistics team consists of researchers and students at the University of Michigan (U-M) English Language Institute (ELI). We create corpora of spoken and written academic English, provide corpus-based materials for EAP (English for Academic Purposes) teaching, and carry out research in different areas of corpus linguistics. On our website you find information about the corpora we make available, the projects we work on, and the training we provide to ELI visiting scholars and University of Michigan students.
LRC MHC

Sketch Engine - 0 views

  •  
    "The Sketch Engine (SkE, also known as Word Sketch Engine) is a Corpus Query System incorporating word sketches, grammatical relations, and a distributional thesaurus. A word sketch is a one-page, automatic, corpus-derived summary of a word's grammatical and collocational behaviour. A Sketch Engine account gives you * Pre-loaded corpora (60M-2B words) for o Chinese, English, French, German, Italian, Japanese, Portuguese, Spanish, Slovene o Other languages to follow * WebBootCaT o Build your own instant corpus o Extract keywords o Specialist terminology, any language * CorpusBuilder o Upload and install your own corpora Web service using standard browsers. No software installation required. "
LRC MHC

[OTA] The Chambers-Rostand Corpus of Journalistic French [Electronic resource] - 0 views

  •  
    The Chambers-Rostand Corpus of Journalistic French [Electronic resource] (Le Corpus Chambers-Rostand du français journalistique) This resource is freely available, you should be able to download it now.
LRC MHC

Corpus del Español - 1 views

  •  
    Corpus del Español: A 100-million word diachronic corpus of Spanish texts, created by Mark Davies of Brigham Young University
LRC MHC

[bnc] British National Corpus - 0 views

  •  
    The British National Corpus (BNC) is a 100 million word collection of samples of written and spoken language from a wide range of sources, designed to represent a wide cross-section of current British English, both spoken and written. Type a word or phrase in the search box and press the Return key on your keyboard to see up to 50 random hits from the corpus.
LRC MHC

Mannheim Corpus: Cyril Belica: Kookkurrenzdatenbank CCDB. Eine korpuslinguist... - 0 views

  •  
    Mannheim Corpus: A very big - and free - corpus of German texts maintained by the Institut für Deutsche Sprache, including a choice of corpora and a lot of search facilities:
LRC MHC

ELISA - English Language Interview Corpus as a Second-Language Learning Application - 0 views

  •  
    The ELISA corpus is being developed at the University of Tuebingen (Dept of Applied English Linguistics, AEL) and the University of Surrey (Dept of Languages and Translation Studies, LTS) as a resource for language learning and teaching, and interpreter training. It contains interviews with native speakers of English. They talk about their professional career (e.g. in tourism, politics, the media or environmental education). We are very grateful to all speakers for their kind contributions. You can use our Concordancer (written in PERL) on text versions of all corpus files. It can be utilized to extract KWIC concordances with variable context length, sentence concordances and word counts.
Daryl Beres

International Corpus of Learner English V2 - 1 views

  •  
    The International Corpus of Learner English (Version 2) is a corpus of writing by higher intermediate to advanced learners of English. It contains 3.7 million words of EFL writing from learners representing 16 different mother tongue backgrounds (Bulgarian, Chinese, Czech, Dutch, Finnish, French, German, Italian, Japanese, Norwegian, Polish, Russian, Spanish, Swedish, Turkish and Tswana). It differs from the first version published in 2002 not only by its increased size and range of learner populations, but also by its interface, which contains two new functionalities: built-in concordancer allowing users to search for word forms, lemmas and/or part-of-speech tags and breakdown of the query results according to the learner profile information.
Daryl Beres

The PolyU Language Bank - 0 views

  •  
    The PolyU Language Bank, developed in the Department of English at Hong Kong PolyU, is a large archive of language corpora made up of a wide range of written and spoken texts totalling over 12 million words. Corpus searches can be performed using the Bank's built-in Web-based concordancer, enabling the easy use of corpus resources for language teaching and research.
LRC MHC

Web Concordancer - 1 views

  •  
    You can search for concordances and concgrams with these programs, find whole sentences with the Sentence Concordancer, browse the Indexed Corpus Files or you can upload and search your own Personal Corpus. Includes English, Chinese, French and Japanese corpora
Daryl Beres

Krieger - Corpus Linguistics: What It Is and How It Can Be Applied to Teaching (TESL/TEFL) - 0 views

  •  
    "This article will address those questions by examining what corpus linguistics is, how it can be applied to teaching English, and some of the issues involved. Resources are also included which will assist anyone who is interested in pursuing this line of study further."
LRC MHC

Using BNC XML for English language study - 0 views

  •  
    This page contains some examples of how the BNC (XML edition) can be used in combination with non-corpus based activities and exercises to study the English language. The exercises are intended as illustrations of what can be done with access to the corpus. Refer to the sample search results provided or perform your own searches.
LRC MHC

Sketch Engine - 0 views

  •  
    The Sketch Engine (SkE, also known as Word Sketch Engine) is a Corpus Query System incorporating word sketches, grammatical relations, and a distributional thesaurus. A word sketch is a one-page, automatic, corpus-derived summary of a word's grammatical and collocational behaviour.
Daryl Beres

[oucs] All About Xaira - 1 views

  •  
    "Xaira is a text searching software originally developed at OUCS for use with the British National Corpus. This new version has been entirely re-written as a general purpose XML search engine, which will operate on any corpus of well-formed XML documents. It is however best used with TEI-conformant documents. Xaira has full Unicode support. This means you can use it to search and display text in any language, provided you have a suitable Unicode font installed on your system. At the heart of Xaira is the Xaira Object Model. This defines a range of objects and methods for representing and searching large amounts of linguistic data. The Xaira Server program implements this model. The Xaira Indexer program creates platform-independent indexes from collections of XML documents for use by the Server. Both these Xaira components can be deployed on any platform. Client programs can access a Xaira server using a close-coupled API such as that used by the Windows client (which is written in C++), or via XMLRPC or SOAP. We provide a fully-featured client for Windows, and a PHP code library which makes it easy to develop applications for the web which can talk to a Xaira server. All versions of Xaira are now distributed free of charge under the GNU General Public Licence."
LRC MHC

Moteur de recherche - Corpus Lexicaux Québécois - 0 views

  •  
    Corpus Lexicaux Québécois: Canadian French corpora with search facilities
LRC MHC

Real Academia Española - CREA - 0 views

  •  
    Real Academia Española - Corpus de Referencia del Español Actual (CREA) - online concordance search, includes options by media and country, and topic area.
LRC MHC

WebCorp: The Web as Corpus - 0 views

  •  
    WebCorp - uses the whole internet as a corpus. Can limit by domain, language, etc.
LRC MHC

VOICE - Vienna-Oxford International Corpus of English - 0 views

  •  
    VOICE comprises naturally occurring, non-scripted face-to-face interactions in English as a lingua franca (ELF). The recordings made for VOICE are keyboarded by trained transcribers and stored as a computerized corpus. Currently VOICE comprises 1 million words of spoken ELF interactions, equalling approximately 120 hours of transcribed speech. The speakers recorded in VOICE are experienced ELF speakers from a wide range of first language backgrounds. So far, VOICE includes approximately 1250 ELF speakers with approximately 50 different first languages (disregarding varieties of the respective languages). In the initial phase, VOICE focuses mainly, though not exclusively, on European ELF speakers. The ELF interactions recorded cover a range of different speech events in terms of domain (professional, educational, leisure), function (exchanging information, enacting social relationships), and participant roles and relationships (acquainted vs. unacquainted, symmetrical vs. asymmetrical).
1 - 20 of 33 Next ›
Showing 20 items per page