Zipf's Law is
one of those empirical rules that characterize a
surprising range of real-world phenomena remarkably
well. It says that if we order some large collection by
size or popularity, the second element in the collection
will be about half the measure of the first one, the
third one will be about one-third the measure of the
first one, and so on. In general, in other words, the
kth-ranked item will
measure about 1/k of the first one.
To take one example, in a typical large body of
English-language text, the most popular word, "the,"
usually accounts for nearly 7 percent of all word
occurrences. The second-place word, "of," makes up 3.5
percent of such occurrences, and the third-place word,
"and," accounts for 2.8 percent. In other words, the
sequence of percentages (7.0, 3.5, 2.8, and so on)
corresponds closely with the 1/k sequence (1/1, 1/2,
1/3…). Although Zipf originally formulated his law to
apply just to this phenomenon of word frequencies,
scientists find that it describes a surprisingly wide
range of statistical distributions, such as individual
wealth and income, populations of cities, and even the
readership of blogs.