Pandas DataFrames

Pandas DataFrames

A Pandas DataFrame is a two-dimensional, labeled data structure used to store data in rows and columns, similar to a table in a database or an Excel spreadsheet. It is the most important and widely used object in Pandas.


Creating a DataFrame

1. From a Dictionary

import pandas as pd

data = {
“Name”: [“Rahul”, “Priya”, “Aman”],
“Age”: [22, 24, 23],
“City”: [“Delhi”, “Mumbai”, “Pune”]
}

df = pd.DataFrame(data)
print(df)


2. From a List of Lists

data = [
["Rahul", 22, "Delhi"],
["Priya", 24, "Mumbai"],
["Aman", 23, "Pune"]
]
df = pd.DataFrame(data, columns=[“Name”, “Age”, “City”])


3. From a CSV File

df = pd.read_csv("data.csv")

Viewing Data

df.head() # First 5 rows
df.tail() # Last 5 rows
df.shape # (rows, columns)
df.columns # Column names
df.info() # Data types & memory usage

Selecting Data

Select Columns

df["Name"]
df[["Name", "City"]]

Select Rows

df.loc[0] # by label
df.iloc[0] # by position

Filter Rows

df[df["Age"] > 22]

Adding & Removing Data

Add a New Column

df["Salary"] = [30000, 35000, 32000]

Drop a Column

df.drop("Salary", axis=1, inplace=True)

Updating Data

df.loc[0, "Age"] = 23

Handling Missing Values

df.isnull().sum()
df.fillna(0)
df.dropna()

Sorting Data

df.sort_values("Age")
df.sort_values("Age", ascending=False)

Common DataFrame Methods

df.describe()
df.count()
df.nunique()
df.value_counts()

Real-World Example

students = pd.DataFrame({
"Name": ["Asha", "Rohit", "Neel"],
"Marks": [85, 92, 78]
})
topper = students[students[“Marks”] > 90]
print(topper)


Key Differences: Series vs DataFrame

Series DataFrame
1-D 2-D
Single column Multiple columns
Index + values Rows + columns

Conclusion

Pandas DataFrames are the backbone of data analysis in Python. Once you master DataFrames, tasks like data cleaning, transformation, and analysis become much easier.

You may also like...