Pandas Data Correlations

Pandas Data Correlations – Complete Guide
Correlation measures the relationship between variables (how one changes when another changes).
1. What Is Correlation? (Beginner)
Correlation value range:
+1 → Perfect positive correlation
0 → No correlation
−1 → Perfect negative correlation
Example:
Sales ↑, Profit ↑ → Positive correlation
Price ↑, Demand ↓ → Negative correlation
2. Setup (Required)
3. Create Sample Data (Running Code)
4. Correlation Between Two Columns
- Returns a single number
- Default method → Pearson
5. Correlation Matrix (Most Common)
- Shows correlation between all numeric columns
- Very common in data analysis & interviews
6. Correlation Methods
Pearson (default)
Linear relationship
Most commonly used
Spearman (rank-based)
Non-linear but monotonic relationships
Works well with outliers
Kendall (ordinal data)
Small datasets
Rank agreement
7. Correlation With Missing Values
- Pandas automatically ignores NaN values
8. Visualize Correlation (Heatmap-Style Using Pandas)
- Quick visual comparison
- No external library needed
9. Scatter Plot to Understand Correlation
- Always combine correlation value + scatter plot
10. Correlation for Selected Columns
- Focused analysis
- Cleaner output
11. Time Series Correlation
- Works the same with time index
12. Rolling Correlation (Advanced)
- Shows how correlation changes over time
- Used in finance & analytics
13. Correlation vs Causation (Very Important)
High correlation ≠ causation
Example:
Ice cream sales ↑
Drownings ↑
Both depend on summer, not each other.
14. Common Mistakes
- Using correlation on categorical data
- Ignoring outliers
- Assuming correlation implies cause
15. Interview Questions (Must Know)
Q1: Default correlation method in Pandas?
Pearson
Q2: How to find correlation between two columns?df["A"].corr(df["B"])
Q3: How to handle missing values?
Pandas ignores NaN automatically
Q4: Which correlation is best for non-linear data?
Spearman
