Skip to main content

Home/ interesting_sites/ Group items tagged language

Rss Feed Group items tagged

pagetribe .

George Orwell, "Politics and the English Language," 1946 - 0 views

  • 1. What am I trying to say? 2. What words will express it? 3. What image or idiom will make it clearer? 4. Is this image fresh enough to have an effect? And he will probably ask himself two more: 1. Could I put it more shortly? 2. Have I said anything that is avoidably ugly?
  • What is above all needed is to let the meaning choose the word, and not the other way around.
  • When you think of a concrete object, you think wordlessly, and then, if you want to describe the thing you have been visualizing you probably hunt about until you find the exact words that seem to fit it. When you think of something abstract you are more inclined to use words from the start, and unless you make a conscious effort to prevent it, the existing dialect will come rushing in and do the job for you, at the expense of blurring or even changing your meaning
  • ...1 more annotation...
  • (i) Never use a metaphor, simile, or other figure of speech which you are used to seeing in print. (ii) Never us a long word where a short one will do. (iii) If it is possible to cut a word out, always cut it out. (iv) Never use the passive where you can use the active. (v) Never use a foreign phrase, a scientific word, or a jargon word if you can think of an everyday English equivalent. (vi) Break any of these rules sooner than say anything outright barbarous.
pagetribe .

http://nltk.googlecode.com/svn/trunk/doc/book/ch01.html - 0 views

  • We can count how often a word occurs in a tex
  • Adding two lists creates a new list
  • count the occurrences of a particular word using text1.count('heaven')
  • ...18 more annotations...
  • By convention, m:n means elements m…n-1
  • A consequence of this last change is that the list only has four elements, and accessing a later value generates an error
  • We can join the words of a list to make a single string, or split a string into a list, as follows:
  • 'Monty Python'.split()
  • frequency distribution
  • frequency of each vocabulary item
  • find the 50 most frequent words
  • hese very long words are often hapaxes (i.e. unique) and perhaps it would be better to find frequently occurring long words.
  • Here are all words from the chat corpus that are longer than 7 characters, that occur more than 7 times:   >>> fdist5 = FreqDist(text5) >>> sorted([w for w in set(text5) if len(w) > 7 and fdist5[w] > 7]) ['#14-19teens', '#talkcity_adults', '((((((((((', '........', 'Question', 'actually', 'anything', 'computer', 'cute.-ass', 'everyone', 'football', 'innocent', 'listening', 'remember', 'seriously', 'something', 'together', 'tomorrow', 'watching'] >>>
  • The collocations() function does this for us
  • find bigrams that occur more often than we would expect based on the frequency of individual words.
  • fdist = FreqDist(samples) create a frequency distribution containing the given samples fdist.inc(sample) increment the count for this sample fdist['monstrous'] count of the number of times a given sample occurred fdist.freq('monstrous') frequency of a given sample fdist.N() total number of samples fdist.keys() the samples sorted in order of decreasing frequency for sample in fdist: iterate over the samples, in order of decreasing frequency fdist.max() sample with the greatest count fdist.tabulate() tabulate the frequency distribution fdist.plot() graphical plot of the frequency distribution fdist.plot(cumulative=True) cumulative plot of the frequency distribution fdist1 < fdist2 test if samples in fdist1 occur less frequently than in fdist2
  • it goes through each word in text1, assigning each one in turn to the variable w and performing the specified operation on the variable.
  • The above notation is called a "list comprehension"
  • [f(w) for ...] or [w.f() for ...],
  • Now that we are not double-counting words like This and this
  • by filtering out any non-alphabetic items:   >>> len(set([word.lower() for word in text1 if word.isalpha()]))
  • A collocation is a sequence of words which occur together unusually often. Thus red wine is a collocation, while the wine is not. A characteristic of collocations is that they are resistant to substitution with words that have similar senses — maroon wine sounds definitely odd.
1 - 3 of 3
Showing 20 items per page