LA ILAHA ILLA HU
Allah, Your Lord There Is No Deity Except Him.

Python Data Science Machine Learning Avito Ad Demand Prediction Deep Learning Machine Learning Model
Table Of Contents1.Problem Overview.
2.Dataset Source And Description.
3.Metrics To Use.
4.Research Papers
(Solutions
Architectures
Kernels.)
5.First Cut Approach.
6.Exploratory Data Analysis(EDA).
7.Model Building And Outputs.
8.App Demonstration.
9.Future Work.
10.Project Source Code.
11.References.
Problem Overview
1.In e-commerce, combinations of tiny, nuanced details of the product can build a massive difference in increasing the interest of a user to purchase a product or services.
2.Following details mentioned below can make a big difference in developing interest if the user got a good glance of the product.
3.The above examples tells us how the seller can optimize product listings in an e-commerce website.
4.But what happens even if the seller has fully optimized listing of his product and not generating any number of sales.
5.This leads to a problem in analyzing the demand of the product that the seller wants to sell.
6.This is very important because if the seller invests his money in advertising and people don’t visit his product or even after visiting.
7.They are not interested in buying his desired product, it clearly explains that there is some sort problem in seller’s product.
8.E-commerce giants such as Amazon, Flipkart, Myntra, Walmart, Ebay, Rakuten and AliExpress spends millions of dollars in advertising.
9.If the demand of the products does not exist, it leads to a terrific loss to the company or even a seller who spends tons of his own money in advertising his product.
10.If his product demands does not exist simply frustrating the seller can lead to a big business problem.
11.In April 2018 Avito, A Russian e-commerce giant, opened up a competition in kaggle which was based on the predictions of demands that any goods has or not.
12.Avito is a Russian classified advertisements website with sections devoted to general goods for sale, jobs, real estate, personals, cars for sale, and services.
13.Avito is the most popular classifieds site in Russia and is the second biggest classifieds site in the world after Craigslist.
14.The dataset provided for this case study has been created by the Avito’s team itself.
15.It has various categorical features such as ad id, ad title, ad description, ad image, item_id, user_id, etc with deal_probablity as the target variable.
16.Here the deal probability is the continuous variable which ranges from 0 to 1.
17.Zeros indicate the least probabilities of the item to be purchased and 1 indicates the highest probabilities of the item to be purchased.
18.So this problem is the Regression problem in machine learning.
Dataset Source And Description
Dataset Source
For this problem statement, we are using ‘train.csv’, ‘test.csv’ and zipped train and test images data for predicting deal probability of ‘AVITO’ ads.
- train.csv— Train data.
- ‘item_id’ - Ad id.
- ‘user_id’ - User id.
- ‘region’ - Ad region.
- ‘city’ - Ad city.
- ‘parent_category_name’ - Top level ad category as classified by Avito's ad model.
- ‘category_name’ - Fine grain ad category as classified by Avito's ad model.
- ‘param_1’ - Optional parameter from Avito's ad model.
- ‘param_2’ - Optional parameter from Avito's ad model.
- ‘param_3’ - Optional parameter from Avito's ad model.
- ‘title’ - Ad title.
- ‘description’ - Ad description.
- ‘price’ - Ad price.
- ‘item_seq_number’ - Ad sequential number for user.
- ‘activation_data’- Date ad was placed.
- ‘user_type’ - User type.
- ‘image’ - Id code of image. Ties to a jpg file in train_jpg. Not every ad has an image.
- ‘image_top_1’ - Avito's classification code for the image.
- ‘deal_probability’- The target variable. This is the likelihood that an ad actually sold something.
It's not possible to verify every transaction with certainty, so this column's value can be any float from zero to one. - test.csv— Test data. Same schema as the train data, except ‘deal_probability’ is not there for this dataset. We need to predict it.
- train_jpg_[0 to 4].zip— zip files which contains images of train data ads. Size is around 50gb.
- test_jpg.zip— zip files which contains images of test data ads. Size is around 20gb.
In the current problem statement, Given the numerical, categorical, image data of a given product, Predict the demand probability value of that product.
This is a regression problem where we need to reduce the Root Mean Square Error(RMSE).

(Solutions
Architectures
Kernels.)
1. Approach No 1: Machine Learning Algorithm
- CatBoost Algorithm has been used for approach.
- It is developed by Yandex.
- Documentation Link: https://catboost.ai/en/
- URL of the Kaggle Solution Kernel: Catboost Solution
The list of all categorical features has to be passed in cat_features parameter and the rest of the data encoding, modeling will be done handled by Catboost.
It outperforms the Gradient Boosting Algorithm. It also utilizes the GPU by providing task_type parameters.
This algorithm can be tested on Avito dataset to find the performance and also performance can be validated accordingly.
2. Approach No 2: Using Neural Networks(Deep Learning Algorithm).
A Neural Networks is a method in artificial intelligence that teaches computers to process data in a
way that is inspired by the human brain. It is a type of machine learning process, called deep
learning, that uses interconnected nodes or neurons in a layered structure that resembles the human brain.
Artificial Neural Networks-ANN Or Simulated Neural Networks-SNNs are a subset of machine learning and are at the heart of deep learning algorithms.
Their name and structure are inspired by the human brain, mimicking the way that biological neurons signal to one another
Using algorithms, they can recognize hidden patterns and correlations in raw data, cluster and
classify it, and can continuously learn and improve.
Artificial Neural Networks (ANNs) are comprised of a node layers, containing an input layer, one or more than one hidden layers, and an output layer.
Each node, or artificial neuron, connects to another and has an associated weight and threshold.
If the output of any individual node is above the specified threshold value, that node is activated, sending data to the next layer of the network.
Otherwise, no data is passed along to the next layer of the network.
3. Approach No 3 : Using Word Embeddings and RNN(Recurrent Neural Networks)
- In the following Kaggle Kernel, Word Embeddings and RNN are used to predict the deal_probability directly.
- The embeddings are learning part of the training process.
- We can even use FastText’s pre train vectors in Russian to achieve better results.
- Pre-trained FastText: Pretrained-Fasttext
- URL of the Kaggle Solution Kernel : Kaggle Kernel
4. Approach No 4: Using Blur Detection with OpenCV
We can identify blurriness in an image using OpenCV and Laplace operator.
In order to identify blurriness in an image, OpenCV and Laplace operator are used.
In order to find the blurriness in an image, below steps are performed:
1.Find Fast Fourier Transform of the image.
2.Note down the distribution of High frequency as well as Low Frequency.
3.If there are low amount of High frequencies then those images are marked as blurred images.
But if this approach is used, then frequency varies for different types of object.
For one object the value for “low amount of high frequency” will be different for other objects.
One such solution is the variation of the Laplace method. by Pech-Pacheco et al. in their 2000 ICPR paper.
Perform the below steps to get the solution:
a.Consider a single image channel.
b.Convolute it in a with 3 * 3 Kernel.
c.Calculate variance.
d.If the variance falls below a threshold, then it is considered as blurry.
All of the above steps can be performed in OpenCV with single line of code.
The implementation is given below:
cv2.Laplacian(image, cv2.CV_64F).var()
So, the main idea here is to include blurriness factor of the images of the ADS as one of the features to determine the demand of ad.
First Cut Approach
The problem can be solved by the following approach:
1.Perform EDA to uncover to check out how each and every features are distributed or do they follow a specific pattern.
2.Find the missing values in the dataset and impute those missing values if possible.
3.Perform feature engineering by extracting additional features from our given features and consider them as input for the prediction.
4.Perform train-validation splitting and encode all the numerical as well as categorical features.
5.Perform training on all Possible Deep Neural Models and Select the model that gives us the best performance values.
Exploratory Data Analysis(EDA)
- In the first step, Venn Diagram is Plotted to Check out the unique and common values for features that are present inside both train as well as test data.
- The above Snapshot is a Venn Diagram of both train and test data of User ID Feature of both Train as well as Test Data.
- The figure below is a Venn Diagram of Cities Feature of Train and Test Data.
- The Venn Diagram of Description features of both Train and Test Data is given below
- Similarly, Venn Diagram can also be plotted on other features to out the unique and common values.
- Now, We Have Also Plotted barchart about which features has unique values that are present in train data
- With the same methodology, we can also plot it for test data.
- We have Also Plotted probability plots on Ad Price numerical and Deal probability Target Variable. Refer to the Snapshots below.
- Since there are higher values of ad price are present inside the data, we have applied log transformation on ad price feature. Probability Plot of Deal Probability After Log Transformation
- Since Avito is a Russian E-Commerce Giant,the values that are present inside some features are in Russian Language.
- For this we have used two translating api Libraries. 1. Googletrans from Google
- Below is a snapshot of top 10 famous regions for Avito ads
- Snapshot of top 10 famous cities for avito ad are given below
- For top 10 Famous Ad Categories
- Similarly, top 10 Famous values can be found out for any other categorical feature.
- Below is a Pie Chart showing distribution of ads based on Region
- Pie Chart showing distribution of ads based on Parent Category Name
- Bar Chart for Users that have given ads Based on Region
- Bar Chart for Users that have given ads Based on parent ad categories
- We have also done EDA on time based data.
- Distribution of Avito Ads in weeks
- Distribution of Avito Ads based on days of the week
- Distribution of Avito Ads based on month days








2. Translator from Microsoft










- For Title
- For Description
- Similarly, we can perform WordCloud on other Text Based Features such as city, regions, ad parameter 1, ad parameter 2, etc.


1.First of all, the csv dataset is loaded as a pandas dataframe.
2.The features that are present in train dataset has the following missing values as given in the snapshot below

Price feature has missing values. Some images and their classification code are missing because some ads do not have images while advertisement.
Also, some user descriptions are missing.
In order to impute the missing values, refer the below Snapshot.
- For price, we are imputing mean value of the price as per ad category.
- For classification code, missing values are imputed as -1 which indicates no classification code for this image.
- For text features like description, ad parameters and possibly title if it has any missing values, values are imputed as ‘nil’ text.
- We also perform feature engineering by extracting some additional features from the given features. 1. Activation Date
- Now, After Performing all imputations, and additional inclusion of features, time to Perform model Building.
2. Description
3. Title
4. Ad Parameters
5. Price
Refer to the below Snapshots for feature Engineering
1.ANN with no Image Features.
- In this model, Simple ANN is built(Without performing extraction on Image Data and Including them as features).
- First of all, let us preprocess the dataset in order for it to work.
- For Categorical Features, LabelEncoder is applied and for Numerical Features, MinMaxScaler is done.
- After preprocessing, Neural Network Model is built. Refer the below snapshot for architecture.
- There are two inputs taken for this case. One Input has Categorical Features as Input and the other has Numerical Features as Input.
- We have taken Loss function as MSE(Mean Squared Error) and Metrics taken are Custom RMSE(Root Mean Squared Error).
- After training the model with hyperparameter tuning and with callbacks, We get the Loss and Metric of the model as


2.ANN with Image Features.
- In this model, ANN is built(This time considering the image features).
- We are considering image blur value, Average Red, Blue and Green value. For missing image values, we are imputing missing values as -1. Refer the below Snapshots.
- Preprocess the dataset in order for it to work.
- For Categorical Features, LabelEncoder is applied and for Numerical Features, MinMaxScaler is done.
- After preprocessing, Neural Network Model is built. Refer the below snapshot for architecture.
- After training the model with hyperparameter tuning and with callbacks, We get the Loss and Metric of the model as


3.ANN with Image Features with GRU Embedding.
- In this model, Simple ANN is built(This time considering the image features) also Using GRU Embedding.
- Preprocess the dataset in order for it to work.
- For Preprocessing Text Features, We are Removing Symbols, Stopwords, html and lxml tags and afterwards Lemmatization is done.
- After Lemmatizing the sentence, onehot representation is done and sequence are padded in same length is done for each sentence
(for title max_length=7 and for description max_length=250). - For Categorical Features, LabelEncoder is applied and for Numerical Features, MinMaxScaler is done.
- After preprocessing, Neural Network Model is built. Refer the below snapshot for architecture.
- After training the model with hyperparameter tuning and with callbacks, We get the Loss and Metric of the model as


4.ANN with Image Features with LSTM Embedding.
- In this model, Simple ANN is built(This time considering the image features) also Using LSTM Embedding.
- Preprocess the dataset in order for it to work.
- For Preprocessing Text Features, We are Removing Symbols, Stopwords, html and lxml tags and afterwards Lemmatization is done.
- After Lemmatizing the sentence, onehot representation is done and sequence are padded in same length is done for each sentence
(for title max_length=7 and for description max_length=250). - For Categorical Features, LabelEncoder is applied and for Numerical Features, MinMaxScaler is done.
- After preprocessing, Neural Network Model is built. Refer the below snapshot for architecture.
- After training the model with hyperparameter tuning and with callbacks, We get the Loss and Metric of the model as


Final Observations
After Training all the models, we get the best losses and metrics as given in the below snapshot
App Demonstration
You can view the app demonstration of the app just by clicking here
Future Work
- The model performance can be improved further by tuning hyperparameters such as activation function types like tanh,elu,etc and no. of GRU and LSTM units.
- Image features can be leveraged by identifying the color values , for instance, Dullness, Sharpness, Whiteness; Maximum Blue,Red or Green Values, etc.
Project Source Code
You can access the complete source code at
GitHub Repo
References
Analytics Vidhya
Kaggle Notebook
GeeksForGeeks
Kaggle Notebook
Deep Learning Machine Learning Model: Avito Ad Demand Prediction Case Study 2
Note: For Downloading on Mobile Phones, just click on the 'Open' Button. The pdf file will be automatically downloaded. You just have to wait for a few seconds to see the
download option available on your mobile screen.