Skip to main content

Home/ MHC Languages/ Group items tagged corpora

Rss Feed Group items tagged

LRC MHC

ELISA - English Language Interview Corpus as a Second-Language Learning Application - 0 views

  •  
    The ELISA corpus is being developed at the University of Tuebingen (Dept of Applied English Linguistics, AEL) and the University of Surrey (Dept of Languages and Translation Studies, LTS) as a resource for language learning and teaching, and interpreter training. It contains interviews with native speakers of English. They talk about their professional career (e.g. in tourism, politics, the media or environmental education). We are very grateful to all speakers for their kind contributions. You can use our Concordancer (written in PERL) on text versions of all corpus files. It can be utilized to extract KWIC concordances with variable context length, sentence concordances and word counts.
LRC MHC

XXXIX SIMPOSIO DE LA SOCIEDAD ESPAÑOLA DE LINGÜÍSTICA - 0 views

  •  
    Annual conference of the Spanish Linguistics Society. 2010 conference focused on corpus linguistics.
LRC MHC

CorpusLAB - 0 views

  •  
    "CorpusLAB is a new FREE site for language learners and language teachers. CorpusLAB is designed to promote language learning based on real English used in different settings. Students can use the site to take a variety of exercises created by teachers. Go to the Student pages and select a topic area (phrasal verbs, Academic English etc.). If you register, you will be able to keep track of your progress. Teachers can use the site in different ways. The central engine of the site is a series of exercise authoring tools. The exercises, which include fill-the-gap, multiple-choice, matching, reorder, and categorise, are designed in a way that promotes the learning of collocations and phrasal patterns. For example, the matching exercise allows up to five columns of items rather than the usual two. One of the aims of the site is to build up resources for specialised English: Medical English, English for Tourism, and so on. "
LRC MHC

Kiyomi Chujo's Homepage - 0 views

  •  
    "I am a researcher and Associate Professor at the College of Industrial Technology, Nihon University, Japan. My current research interests are vocabulary selection, vocabulary learning, e-learning, and the pedagogical applications of corpus linguistics."
LRC MHC

Corpus of Contemporary American English (COCA) - 0 views

  •  
    400+ million words, 1990-2009, Mark Davies, Brigham Young University
LRC MHC

Linguistic Annotation Wiki - 1 views

  •  
    This wiki describes tools and formats for creating and managing linguistic annotations. `Linguistic annotation' covers any descriptive or analytic notations applied to raw language data. The basic data may be in the form of time functions -- audio, video and/or physiological recordings -- or it may be textual. The added notations may include transcriptions of all sorts (from phonetic features to discourse structures), part-of-speech and sense tagging, syntactic analysis, "named entity" identification, co-reference annotation, and so on. The focus is on tools which have been widely used for constructing annotated linguistic databases, and on the formats commonly adopted by such tools and databases.
Daryl Beres

International Corpus of Learner English V2 - 1 views

  •  
    The International Corpus of Learner English (Version 2) is a corpus of writing by higher intermediate to advanced learners of English. It contains 3.7 million words of EFL writing from learners representing 16 different mother tongue backgrounds (Bulgarian, Chinese, Czech, Dutch, Finnish, French, German, Italian, Japanese, Norwegian, Polish, Russian, Spanish, Swedish, Turkish and Tswana). It differs from the first version published in 2002 not only by its increased size and range of learner populations, but also by its interface, which contains two new functionalities: built-in concordancer allowing users to search for word forms, lemmas and/or part-of-speech tags and breakdown of the query results according to the learner profile information.
LRC MHC

Overview - Spanish FrameNet Project - 0 views

  •  
    The Spanish FrameNet Project is creating an online lexical resource for Spanish, based on frame semantics and supported by corpus evidence. The "starter lexicon" is available to the public, and contains more than 1,000 lexical items (verbs, predicative nouns, and adjectives, adverbs, prepositions and entities) representative of a wide range of semantic domains. The aim is to document the range of semantic and syntactic combinatory possibilities (valences) of each word in each of its senses, through: * human approved and automatic annotated example sentences and * automatic capture and organization of the annotation results. The Spanish FrameNet database will be in a platform-independent format, and it is able to be displayed and queried via the web and other interfaces.
LRC MHC

VOICE - Vienna-Oxford International Corpus of English - 0 views

  •  
    VOICE comprises naturally occurring, non-scripted face-to-face interactions in English as a lingua franca (ELF). The recordings made for VOICE are keyboarded by trained transcribers and stored as a computerized corpus. Currently VOICE comprises 1 million words of spoken ELF interactions, equalling approximately 120 hours of transcribed speech. The speakers recorded in VOICE are experienced ELF speakers from a wide range of first language backgrounds. So far, VOICE includes approximately 1250 ELF speakers with approximately 50 different first languages (disregarding varieties of the respective languages). In the initial phase, VOICE focuses mainly, though not exclusively, on European ELF speakers. The ELF interactions recorded cover a range of different speech events in terms of domain (professional, educational, leisure), function (exchanging information, enacting social relationships), and participant roles and relationships (acquainted vs. unacquainted, symmetrical vs. asymmetrical).
LRC MHC

Loyola Computer-Mediated Communication Corpus - 0 views

  •  
    This site provides access to a corpus of over 900 text samples gathered from test subjects at Loyola College, Baltimore, Maryland, in 2006 and 2007. Twenty-one subjects provide a completely correlated corpus in which each subject provided their opinion in each of six predetermined topics in each of six genres: blog, chat, discussion, email, essay, and interview. We hope this corpus will be useful to researchers in the fields of natural language processing and computational linguistics.
LRC MHC

EXMARaLDA - 1 views

  •  
    "EXMARaLDA steht für "Extensible Markup Language for Discourse Annotation". Es ist ein System von Konzepten, Datenformaten und Werkzeugen für die computergestützte Transkription und Annotation gesprochener Sprache, sowie für das Erstellen und Auswerten von Korpora gesprochener Sprache. EXMARaLDA wird im Teilprojekt "Computergestützte Erfassungs- und Analysemethoden multilingualer Daten" des Sonderforschungsbereichs "Mehrsprachigkeit" (SFB 538) der Universität Hamburg entwickelt. Alle Komponenten des EXMARaLDA-Systems sind auch für Nutzer außerhalb des SFB frei verfügbar."
« First ‹ Previous 41 - 51 of 51
Showing 20 items per page