Zipf Distribution

📊 Zipf Distribution in Python

The Zipf distribution is a discrete probability distribution that models ranked data, where the frequency of an item is inversely proportional to its rank.
It appears in linguistics, social networks, and city populations, following the 80/20 or power-law rule.


1. Characteristics of Zipf Distribution

  • Discrete distribution (integers: 1, 2, 3, …)

  • Parameter: a > 1 → exponent characterizing the distribution

  • Probability Mass Function (PMF):

P(X=k)=1/ka∑n=1N1/na,k=1,2,3,…P(X=k) = \frac{1/k^a}{\sum_{n=1}^{N} 1/n^a}, \quad k = 1,2,3,…

  • Heavy-tailed distribution: a few items have very high frequency, many items have low frequency

Applications:

  • Word frequencies in text (most common words appear much more often)

  • City populations

  • Website traffic distribution


2. Generate Zipf Data Using NumPy


 

Output (example):

[1 2 1 3 1 1 2 1 4 1]
  • Mostly small integers (rank 1, 2, 3)

  • Few large values → heavy tail


3. Visualize Zipf Distribution


 

  • Most counts occur at low ranks

  • Long tail toward higher ranks


4. Compare Different Parameters


 

  • Smaller a → heavier tail (more high-rank values)

  • Larger a → concentrated at low ranks


 5. Summary Table

FunctionParametersDescription
np.random.zipf()a, sizeGenerates Zipf random integers
aExponent parameterControls heaviness of tail (>1)
sizeNumber of samplesOutput array size

🎯 Practice Exercises

  1. Generate 1000 Zipf random numbers with a=2 and plot histogram.

  2. Compare Zipf distributions with a=1.5 vs a=3.

  3. Analyze the proportion of top 10 ranks vs total data.

You may also like...