لَآ إِلَـٰهَ إِلَّا هُوَ
LA ILAHA ILLA HU
Allah, Your Lord There Is No Deity Except Him.

# Python Data Science Pandas Removing Duplicates Fixing Cleaning Removing Wrong Data Cleaning Wrong Data

###### df.drop_duplicates(inplace = True)
How to remove duplicates in Pandas?

Step 1. Check for Duplicates Duplicated() Method Returns A Boolean df.drop_duplicates(inplace = True)

Step 2. Remove All Duplicates From the Dataset df.drop_duplicates(inplace = True)

Duplicate rows are rows that have been registered more than one time, examine the dataset below.

Apple, Bananas and Mangoes are values that are appearing more than once.

By taking a look at our test data set, we can observe that row 2, 5 and 8 are duplicates.

In a large enough dataset you may not be able to discover duplicates just by looking at the dataset, to discover duplicates, we can use the duplicated() method.

The duplicated() method returns a Boolean values for each row.

Example 1: Check for duplicates in the dataset.

Code

import pandas as pd

import numpy as np

LGI = {

'Low GI Diet Fruits':
["Apple","Apricots","Apple",
"Bananas","Grapes","Bananas",
"Mangoes","Orangs","Mangoes",
"Pineapple"], 'Weight (Gms)' :
[120,60,120,
120,120,120,120,120,120,120], 'GI Scores':
[40,32,40,47,43,47,51,48,51,51] }

df = pd.DataFrame(LGI)

print(df.duplicated())

the output will be

```0    False
1    False
2     True
3    False
4    False
5     True
6    False
7    False
8     True
9    False
dtype: bool
```
Remove All Duplicates

Example 2: Remove all duplicates from the dataset..

Code

import pandas as pd

import numpy as np

LGI = {

'Low GI Diet Fruits':
["Apple","Apricots","Apple",
"Bananas","Grapes","Bananas",
"Mangoes","Orangs","Mangoes",
"Pineapple"],

'Weight (Gms)' :[120,60,120,
120,120,120,120,120,120,120],

'GI Scores':[40,32,40,47,43,47,51,48,51,51]

}

df = pd.DataFrame(LGI)

df.drop_duplicates(inplace = True)

print(df)

the output will be

Note: observe that row 2, 5 and 8 have been removed.
If your data is a csv file use
in place of df = pd.DataFrame(LGI)
Point to Remember: The (inplace = True) will make sure that the method does NOT return a new DataFrame, but it will remove all duplicates from the original DataFrame.
Real Life Example: How To Remove Duplicates From Big Data With Pandas
We had a dataset containing 49775 items, we worked on the following code which we are sharing below. After applying the code our data set came down to 46000 items.

Code
import pandas as pd
import numpy as np