LA ILAHA ILLA HU
Allah, Your Lord There Is No Deity Except Him.

Python Data Science Pandas Fixing Cleaning Removing Wrong Data
Data Cleaning refers to fixing bad data in your data set.
How to Fix Or Clean Bad Data In Pandas?
Bad data could be.
1. Empty cells.
2. Data in wrong format.
3. Wrong data.
4. Duplicates.
5. In this lesson we work on the following data for demonstration.
Notice that the above dataset has two rows(12, 13) containing NaN values and one row with wrong data entry.
Empty Cells
Empty cells can potentially give you a wrong result when you analyze data.
Remove Rows
One way to deal with empty cells is to remove rows that contain empty cells.
This is usually OK, since data sets can be very big, and removing a few rows will not have a big impact on the result.
Example 1: Remove empty cells. Return a new Data Frame with no empty cells.
Code
import pandas as pd
df = pd.read_csv('dummy.csv')
new_df = df.dropna()
print(new_df.to_string())
#Notice in the result that some rows have been removed (row 13 and 14).
#These rows had cells with empty values.
the output will be
Note: rows 13 and 14 has been removed.
How to Replace Empty Values?
One more way of dealing with empty cells is to insert a new value.
This way you do not have to delete entire rows just because of some empty cells.
The fillna() method: The fillna() method allows us to replace empty cells with a value.
Example 2: Replace NULL values with the 'High'.
Code
import pandas as pd
df = pd.read_csv('dummy.csv')
df.fillna('High', inplace = True)
print(df.to_string())
the output will be
Replace Empty Cells Using Mean, Median, or Mode.
Now Observe the dataset below.
Note: Row 12 has a Nan Value
A common way to replace empty cells, is to calculate the mean, median or mode value of
the column.
Pandas uses the mean() median() and mode() methods to calculate the respective values
for a specified column.
Example 3 Calculate the MEAN, and replace any empty values in the above dataset with it.
Code
import pandas as pd
df = pd.read_csv('dummy1.csv')
x = df["Temp In Celcius"].mean()
df["Temp In Celcius"].fillna(x, inplace = True)
print(df.to_string())
the output will be
Note: Row 12 the Nan Value has been replaced by the mean value 28.92.
Mean = the average value (the sum of all values divided by number of values)
Example 4 Calculate the MEDIAN, and replace any empty values in the above dataset with it.
Code
import pandas as pd
df = pd.read_csv('dummy1.csv')
x = df["Temp In Celcius"].median()
df["Temp In Celcius"].
fillna(x, inplace = True)
print(df.to_string())
the output will be
Note: Row 12 the Nan Value has been replaced by the median value 29.
Median = the value in the middle, after you have sorted all values ascending.
Example 5: Calculate the MODE, and replace any empty values in the above dataset with it.
Code
import pandas as pd
df = pd.read_csv('dummy1.csv')
x = df["Temp In Celcius"].mode()[0]
df["Temp In Celcius"].fillna(x, inplace = True)
print(df.to_string())
the output will be
Note: Row 12 the Nan Value has been replaced by the mode value 26.
Mode = the value that appears most frequently.