Skip to main content

Home/ Words R Us/ Group items tagged corpus

Rss Feed Group items tagged

Ryan Catalani

Corpus of Contemporary American English (COCA) - 1 views

shared by Ryan Catalani on 01 Aug 11 - No Cached
Lisa Stewart liked it
  •  
    "The Corpus of Contemporary American English (COCA) is the largest freely-available corpus of English, and the only large and balanced corpus of American English. It was created at Brigham Young University in 2008, and it is now used by tens of thousands of users every month (linguists, teachers, translators, and other researchers). ... The corpus contains more than 425 million words of text and is equally divided among spoken, fiction, popular magazines, newspapers, and academic texts."
kylieilonummi20

Corpus analysis of the language of Covid-19 - 1 views

  •  
    Check this article out to learn more about how our own language and our Top 20 keywords in the Oxford Corpus has changed since the beginning of the pandemic. While some words are not uncommon, two new ones come to mind. These are "social distance/social distancing" and "self-isolate/self-isolation." We can see the impact of the coronavirus by seeing which words are now used more frequently.
Lisa Stewart

Google Ngram Viewer - 4 views

  •  
    Graphically compare the popularity of phrases over time.
  •  
    This would be great for someone's field research--thanks, Ryan!
  •  
    Google's own book corpus tool
Ryan Catalani

CORPORA: 45-400 million words each: free online access - 1 views

  •  
    Including corpora of historical and contemporary American English, British English, Spanish, and Portuguese.
Steve Wagenseller

Independent thinking -- corpus callosotomy video - 2 views

  •  
    In the 1960s, Michael Gazzaniga with Roger Sperry & Joseph Bogen pioneered split brain research. This video shows how one patient's language centers for comprehension and speech are now distinct due to the cutting of his corpus callosum.
  •  
    Wow! I think the students will find this fascinating.
blaygo19

Scotch Snaps in Hip Hop - YouTube - 1 views

  •  
    Talks about how rhythmic characteristics of language and accents are reflected in the rhythms of songs.
  •  
    In a 2011 published study (https://mp.ucpress.edu/content/29/1/51.full.pdf+html), Nicholas Temperley and David Temperley, 2 musicologists, did a musical corpus analysis showing that the Scotch snap, a sixteenth-note on the beat followed by a dotted eighth-note, is common in both Scottish and English songs, but virtually nonexistent in German and Italian songs, and explored possible linguistic correlates for this phenomenon. British English shows a much higher proportion of very short stressed syllables (less than 100 ms) than the other two languages. Four vowels account for a large proportion of very short stressed syllables in British English, and also constitute a large proportion of SS tokens in our English musical corpus. A Scotch snap, as Adam Neely notes in the above video, is the musical, rhythmical counterpart to a trochee in poetry. Say the phrase "Teenage Mutant Ninja Turtles" to hear a series of Scotch snaps.
Ryan Catalani

Research Gives Insight into Brain Function of Adults Who Stutter - 3 views

  •  
    "[New research] is suggesting that atypical brain function is a fundamental aspect of speech production tasks for adults who stutter. ... "Now because many speech areas are interconnected across the two hemispheres through the corpus callosum, it might suggest that hemispheric dominance for speech and language has not been established to the same degree as it has been for normally fluent adults." ... Loucks then used functional magnetic resonance imaging of brain activity to study participants who stutter and found that even brief, simple speech tasks - such as producing a single word to name a picture - is associated with altered functional activity."
Lara Cowell

Mining Books to Map Emotions Through a Century - 1 views

  •  
    A group of anthropologists from England used a computer program to analyze the emotional content of books from every year of the 20th century - close to a billion words in millions of books. Researchers found that the Twenties marked the apex of joy-related words; the overall usage of commonly known emotion words, however, has been in decline over the 20th century. The one exception: "fear", which started to increase just before the 1980s.
anonymous

Corpus Linguistics - NYTimes.com - 0 views

  •  
    Linguists can generally be divided into two groups: prescriptivists, or those who hold that language is governed by fixed rules of grammar, and descriptivists, or those who believe that patterns of actual usage reflect the way the language is used. In extremely broad strokes, if prescriptivists are anal retentive, then descriptivists are free-to-be-you-and-me.
Lisa Stewart

Exposing Literary Style, One Word at a Time - NYTimes.com - 5 views

  •  
    includes links to other literary corpora
Lisa Stewart

Google N-gram Viewer - Culturomics - 0 views

  • The Google Labs N-gram Viewer is the first tool of its kind, capable of precisely and rapidly quantifying cultural trends based on massive quantities of data. It is a gateway to culturomics! The browser is designed to enable you to examine the frequency of words (banana) or phrases ('United States of America') in books over time. You'll be searching through over 5.2 million books: ~4% of all books ever published! 
  • Basically, if you’re going to use this corpus for scientific purposes, you’ll need to do careful controls to make sure it can support your application. Like with any other piece of evidence about the human past, the challenge with culturomic trajectories lie in their interpretation. In this paper, and in its supplementary online materials, we give many examples of controls, and of methods for interpreting trajectories. 
  •  
    more detail from Harvard about how to use N-gram
Ryan Catalani

The Mechanic Muse - The Jargon of the Novel, Computed - NYTimes.com - 0 views

  •  
    "Now in the 21st century, with sophisticated text-crunching tools at our disposal, it is possible to put Bridgman's theory to the test. Has a vernacular style become the standard for the typical fiction writer? Or is literary language still a distinct and peculiar beast?"
Ryan Catalani

Futurity.org - Left-right brain 'talk' despite broken link - 3 views

  •  
    "Even when daydreaming, there is a tremendous amount of communication happening between different areas in the brain... The fact that these areas are synchronized has led many scientists to presume that they are all part of an interconnected network called a resting-state network. ... that these resting-state networks look essentially normal in people missing the corpus callosum link... [it] highlights the brain's remarkable plasticity... the findings are significant when considering the link between brain connections and autism or schizophrenia."
Lara Cowell

Analyzing The Language Of Suicide Notes To Help Save Lives : NPR - 1 views

  •  
    A team of researchers at the Cincinnati Children's Hospital use computers to analyze the language of suicide notes, in the hope that they can better identify those at risk. By comparing patient interview responses to suicide notes, they can identify how similar or divergent their language is from the language of suicide.Here are three patterns researchers have identified in their corpus of authentic suicide notes: 1. Loss of hope. When hope is gone, when hopelessness emerges - and that's in most of the notes 2. Practical instruction, e.g.. "Remember to change the tires. Remember to change the oil. I drew a check, but I didn't put the money in. Please go ahead and make the deposit." 3. The presence of the following emotions: depression, a little bit of anger, abandonment, and the sense of "I just can't go on any longer. I can't deal with this any longer."
Lara Cowell

The Chinese Language as a Weapon: How China's Netizens Fight Censorship - 2 views

  •  
    Censorship has been a long-standing issue in China, but its citizens continue to fight for self-expression through clever linguistic circumvention of Internet restrictions. Much of Chinese Internet lingo involves codewords, and the corpus of codewords is constantly changing to accommodate new topics and avoid smarter, stricter censors. It has reached the point where a simple understanding of Chinese vocabulary, syntax, and grammar is no longer enough to fully understand Chinese Internet discourse. On today's Chinese Internet, fully comprehending the language requires a thorough knowledge of current events, a deep respect for historical implications, an agile mastery of cultural conventions, and more often than not, a healthy appreciation of topical humor.
Lara Cowell

Sex-Based Differences in Compliment Behavior - 1 views

  •  
    Sex-based differences in the form of English compliments and in the frequencies of various compliment response types are discussed. Based on a corpus of I,062 compliment events, several differences in the form of compliments used by women and men are noted. Further, it is found that compliments from men are generally accepted, especially by female recipients, whereas compliments from women are met with a response type other than acceptance.
Lara Cowell

Why your usual Wordle strategy isnʻt working today, according to a linguistic... - 0 views

  •  
    TechRadar spoke to Dr Matthew Voice, an Assistant Professor in Applied Linguistics at the UK's University of Warwick, to find out the science behind the struggle to deduce Wordle Puzzle #256. "[In your live blog] you've already talked about _ATCH as a kind of trap. This is an example of an n-gram, i.e. a group of letters of a length (n) that commonly cluster together. So this is an n-gram with a length of four letters: a quadrigram," Professor Voice tells us. "Using [this] Project Gutenberg data, it's interesting to note that _ATCH isn't listed as one of the most common quadrigrams in English overall, but the [same] data considers words of all lengths, rather than just the five letters Wordle is limited to. I don't know of any corpus exclusively composed of common 5 letter words, but it might be the case that _ATCH happens to be particularly productive for that length." "The other thing to mention," Professor Voice adds, "would be that the quadrigram _ATCH is made up of smaller n-grams, like the bigram AT, which is extremely common in English. So we're seeing a lot of common building blocks in one word, which means that sorting individual letters might not be narrowing down people's guesses as much as it would with other words."
1 - 19 of 19
Showing 20 items per page