Pandas Cleaning Data of Wrong Format

Pandas – Cleaning Data of Wrong Format

Wrong format data means values are stored in an incorrect data type or inconsistent representation, such as numbers stored as strings, invalid dates, or mixed formats. Cleaning these is essential for accurate analysis in Pandas.


1. Identify Wrong Data Formats

df.dtypes # Check data types
df.info() # Detailed structure

2. Converting Data Types

Convert String to Integer / Float

df["Age"] = df["Age"].astype(int)
df["Salary"] = df["Salary"].astype(float)

(Use only when data is clean.)


Safe Conversion with Errors Handling

df["Age"] = pd.to_numeric(df["Age"], errors="coerce")

(Invalid values become NaN.)


3. Cleaning Date & Time Formats

Convert to Datetime

df["Date"] = pd.to_datetime(df["Date"])

Handle Invalid Dates

df["Date"] = pd.to_datetime(df["Date"], errors="coerce")

4. Cleaning Text vs Numeric Data

Example: "₹10,000"10000

df["Salary"] = (
df["Salary"]
.str.replace("₹", "")
.str.replace(",", "")
.astype(int)
)

5. Standardizing Text Formats

df["City"] = df["City"].str.lower()
df["City"] = df["City"].str.title()
df["Name"] = df["Name"].str.strip()

6. Fixing Mixed Data Types in Columns

df["Marks"] = pd.to_numeric(df["Marks"], errors="coerce")
df["Marks"].fillna(df["Marks"].mean(), inplace=True)

7. Cleaning Boolean Data

df["Active"] = df["Active"].replace({"Yes": True, "No": False})

8. Removing Invalid Rows

df = df[df["Age"] > 0]

9. Real-World Example

import pandas as pd

data = {
"Name": ["Amit", "Riya", "Karan"],
"Age": ["22", "twenty", "25"],
"JoinDate": ["2024-01-10", "invalid", "2024/02/15"]
}

df = pd.DataFrame(data)

df["Age"] = pd.to_numeric(df["Age"], errors="coerce")
df["JoinDate"] = pd.to_datetime(df["JoinDate"], errors="coerce")

print(df)


10. Best Practices

✔ Always inspect data types
✔ Use errors="coerce" for safety
✔ Clean text before conversion
✔ Validate data after cleaning


Conclusion

Cleaning wrong-format data ensures your DataFrame is consistent, usable, and accurate. Pandas provides flexible tools to detect, convert, and fix formatting issues efficiently.

You may also like...