This site hosts oral history vignettes, 30 second to 3 minutes long with accompanying transcriptions. Currently there are 8. A few have simple exercises. We will be adding new anecdotes every week from the 500 oral histories I have collected over the past 15 years from Spanish America and Spain. Some are almost entirely in the present tense, some in past, some mixed. There is a particularly cute one about the term of endearment "gordito." Suggestions, collaboration welcome.
The Sketch Engine (SkE, also known as Word Sketch Engine) is a Corpus Query System incorporating word sketches, grammatical relations, and a distributional thesaurus. A word sketch is a one-page, automatic, corpus-derived summary of a word's grammatical and collocational behaviour.
This FREE program lets you create word lists and search natural language text files for words, phrases, and patterns. SCP is a concordance and word listing program that is able to read texts written in many languages.There are built-in alphabets for English, French, German, Polish, Greek, Russian, etc. SCP contains an alphabet editor which you can use to create alphabets for any other language.
Language, teaching & text software - TOOLS:
* PhraseContext:Text analysis tool, writing tool, collocation analysis, concordancing, text and XML output, and much more.
* Calculator Calculate T-score, Z-score and Mutual Information. (Free)
* Simpel Grammatik - grammar teaching software (only in Danish) (Free)
* Convert : Extract text from PDF-files (Free)
* Tokeniser - a small freeware utility. (Free)
* Some Object Pascal/Delphi string routines
"Welcome to the ItalNet publication of the Opera del Vocabolario Italiano (OVI) textual database. The production database contains 1849 vernacular texts (21.2 million words, 479,000 unique forms) the majority of which are dated prior to 1375, the year of Boccaccio's death. The beta-test installation of the database under PhiloLogic3 contains 1960 documents (see below). The verse and prose works include early masters of Italian literature like Dante, Petrarch, and Boccaccio, as well as lesser-known and obscure texts by poets, merchants, and medieval chroniclers. The OVI database was created to aid in the compilation of an historical dictionary of the Italian language, the Tesoro della lingua italiana delle origini, (portions of which are now available online). The fully-searchable ItalNet implementation of the OVI database presented here has been produced in order to enable scholars around the world to benefit from this rich textual resource. "
"OLAC, the Open Language Archives Community, is an international partnership of institutions and individuals who are creating a worldwide virtual library of language resources by: (i) developing consensus on best current practice for the digital archiving of language resources, and (ii) developing a network of interoperating repositories and services for housing and accessing such resources. "
Mannheim Corpus: A very big - and free - corpus of German texts maintained by the Institut für Deutsche Sprache, including a choice of corpora and a lot of search facilities:
Welcome to the Logos context search facility LOGOS LIBRARY. The LOGOS LIBRARY is a powerful interface with a massive database (currently 707.737.941 words) containing multilingual novels, technical literature and translated texts. Hits are highlighted in context windows that can be expanded up or down. To go to the source web pages (novels, etc.) click on the title - to run a dictionary search click on the highlighted word or phrase.
Search the Web directly for concordances of words and phrases in 34 different languages. This new release (last update: 24 May 2010) adds support for selecting which documents to include in the zipfile, preselection based on document metrics, combining all textfiles into a single document for importing into kfNgram or a concordancer, and conversion from UTF-8 into more widely-supported encodings.
ABU : la Bibliothèque Universelle: L'accès libre au texte intégral d'oeuvres du domaine public francophone sur Internet depuis 1993.
Allow search of full-text.
The British National Corpus (BNC) is a 100 million word collection of samples of written and spoken language from a wide range of sources, designed to represent a wide cross-section of current British English, both spoken and written.
Type a word or phrase in the search box and press the Return key on your keyboard to see up to 50 random hits from the corpus.
Real Academia Española - Corpus de Referencia del Español Actual (CREA) - online concordance search, includes options by media and country, and topic area.
Gutenberg Books - full text search. Includes many languages - Languages with more than 50 books: Chinese Dutch English Esperanto Finnish French German Greek Italian Latin Portuguese Spanish Swedish Tagalog