Zipf Distribution
📊 Zipf Distribution in Python
The Zipf distribution is a discrete probability distribution that models ranked data, where the frequency of an item is inversely proportional to its rank.
It appears in linguistics, social networks, and city populations, following the 80/20 or power-law rule.
✅ 1. Characteristics of Zipf Distribution
-
Discrete distribution (integers: 1, 2, 3, …)
-
Parameter:
a > 1→ exponent characterizing the distribution -
Probability Mass Function (PMF):
P(X=k)=1/ka∑n=1N1/na,k=1,2,3,…P(X=k) = \frac{1/k^a}{\sum_{n=1}^{N} 1/n^a}, \quad k = 1,2,3,…
-
Heavy-tailed distribution: a few items have very high frequency, many items have low frequency
Applications:
-
Word frequencies in text (most common words appear much more often)
-
City populations
-
Website traffic distribution
✅ 2. Generate Zipf Data Using NumPy
Output (example):
-
Mostly small integers (rank 1, 2, 3)
-
Few large values → heavy tail
✅ 3. Visualize Zipf Distribution
-
Most counts occur at low ranks
-
Long tail toward higher ranks
✅ 4. Compare Different Parameters
-
Smaller
a→ heavier tail (more high-rank values) -
Larger
a→ concentrated at low ranks
✅ 5. Summary Table
| Function | Parameters | Description |
|---|---|---|
np.random.zipf() |
a, size | Generates Zipf random integers |
a |
Exponent parameter | Controls heaviness of tail (>1) |
size |
Number of samples | Output array size |
🎯 Practice Exercises
-
Generate 1000 Zipf random numbers with
a=2and plot histogram. -
Compare Zipf distributions with
a=1.5vsa=3. -
Analyze the proportion of top 10 ranks vs total data.
