لَآ إِلَـٰهَ إِلَّا هُوَ
LA ILAHA ILLA HU
Allah, Your Lord There Is No Deity Except Him.

Python Data Science Machine Learning Lesson 11: Train/Test

Train-Test method is used to split the dataset

In this method, Data is split into 2 parts:

1. Train data
2. Test data

Train data trains the model.

Test data evaluates and test the accuracy of the model.

There are mainly 3 methods of Train/Test method:

1. 80-20 method- 80% data is used for training and 20% is used for testing.
2. 67-33 method- 67% data is used for training and 33% is used for testing.
3. 50-50 method- 50% data is used for training and 50% is used for testing.
(2nd and 3rd are mainly used in larger datasets to improve model accuracy and performance.)

Let us take the abalone dataset that we had created in Multiple Regression chapter.
The dataset looks like this.

Now, load the dataset and seperate X and y and then allot them to train and test.
The splitting of dataset is done by sklearn's model_selection module which has the train_test_split() method

The python code for train and test splitting is given as

code


Output will be
Before training, shape of X : (4177, 7)
Shape of X train is : (3341, 7)
Shape of X test is : (3341,)
Shape of y train is : (836, 7)
Shape of y test is : (836,)

Explanation:

1. The required libraries are first imported.

2. We have loaded the abalone dataset for train-test split.
The description about this dataset is given in the Multiple Regression lesson.

3. Finally, we take X and y dataset.
X contains independent variables whereas y contain dependent variable.
Get the shape of both X and y before splitting
4. Perform the train/test split. For 80-20, set the values of train_size and test_size as given in the above code.
Similarly for 67-33, you can set the values of train_size and test_size to 0.67 and 0.33 respectively.

5. Get the shape of both train and test datasets. Compare them with the above dataset before splitting.
You can notice that No of rows in the splitted dataset are less than that of the unsplitted dataset.

Note: if you have python installed on your pc you can install pandas as under.
Open Command Prompt from the start menu.
Inside the command prompt, type
pip install pandas
press enter
This command will install pandas on your computer after which you can run on python.

Note: if you have python installed on your pc you can install sklearn as under.
Open Command Prompt from the start menu.
Inside the command prompt, type
pip install sklearn
press enter
This command will install sklearn on your computer after which you can run on python.