Classroom Companion: Business


Download 5.51 Mb.
Pdf ko'rish
bet223/323
Sana19.09.2023
Hajmi5.51 Mb.
#1680971
1   ...   219   220   221   222   223   224   225   226   ...   323
Bog'liq
Introduction to Digital Economics

Box 16.1 Zipf’s Law
Zipf ’s law is based on the observation 
that the most frequent word in English 
(“the”) is twice as frequent as the sec-
ond-most frequent word (“of ”), three 
times as frequent as the third-most fre-
quent word (“and”), and so on. Zipf ’s 
law holds quite well for, at least, the first 
1000 words in the English language 
(Schroeder, 
2009
). The frequency of 
words is then derived from the harmonic 
series:
1
1
2
1
3
1
4
1
, , ,
,
,
,

N
or more precisely, the frequency f(kN
of the k-th-most frequent word is:
f k N
k
j
k
N
j
N
;









1
1
1
1
/
/
/
ln
,

in which N is the number of words in the 
English language and γ ≈ 0.57722… is 
the Euler-Mascheroni constant. We have 
used the fact that:
k
N
k
N
O
N



  





1
1
1
ln
.

The notation O(1/N) (the “big O” nota-
tion) indicates that this term decreases 
at least as fast as 1/N as N increases. For 
large N, the last term can, therefore, be 
ignored. The statistical distribution with 
frequency f(kN) is also called the 
Zipfian distribution. Note that the distri-
bution depends on the cutoff N.
Zipf ’s law describes, in addition to 
word usage, the rank distribution of 
amazingly many natural and sociologi-
cal phenomena: size of cities, size of 
countries (except China and India), 
length of rivers, size of sand grains, 
wealth among people, and, as we have 
just seen, popularity of books.

Download 5.51 Mb.

Do'stlaringiz bilan baham:
1   ...   219   220   221   222   223   224   225   226   ...   323




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling