Zipf Distribution
📊 Zipf Distribution in Python
The Zipf distribution is a discrete probability distribution that models ranked data, where the frequency of an item is inversely proportional to its rank.
It appears in linguistics, social networks, and city populations, following the 80/20 or power-law rule.
1. Characteristics of Zipf Distribution
Discrete distribution (integers: 1, 2, 3, …)
Parameter:
a > 1→ exponent characterizing the distributionProbability Mass Function (PMF):
P(X=k)=1/ka∑n=1N1/na,k=1,2,3,…P(X=k) = \frac{1/k^a}{\sum_{n=1}^{N} 1/n^a}, \quad k = 1,2,3,…
Heavy-tailed distribution: a few items have very high frequency, many items have low frequency
Applications:
Word frequencies in text (most common words appear much more often)
City populations
Website traffic distribution
2. Generate Zipf Data Using NumPy
Output (example):
Mostly small integers (rank 1, 2, 3)
Few large values → heavy tail
3. Visualize Zipf Distribution
Most counts occur at low ranks
Long tail toward higher ranks
4. Compare Different Parameters
Smaller
a→ heavier tail (more high-rank values)Larger
a→ concentrated at low ranks
5. Summary Table
| Function | Parameters | Description |
|---|---|---|
np.random.zipf() | a, size | Generates Zipf random integers |
a | Exponent parameter | Controls heaviness of tail (>1) |
size | Number of samples | Output array size |
🎯 Practice Exercises
Generate 1000 Zipf random numbers with
a=2and plot histogram.Compare Zipf distributions with
a=1.5vsa=3.Analyze the proportion of top 10 ranks vs total data.
