لَآ إِلَـٰهَ إِلَّا هُوَ
LA ILAHA ILLA HU
Allah, Your Lord There Is No Deity Except Him.


avito

Python Data Science Machine Learning Avito Ad Demand Prediction Deep Learning Machine Learning Model

Table Of Contents
1.Problem Overview.
2.Dataset Source And Description.
3.Metrics To Use.
4.Research Papers
(Solutions
Architectures
Kernels.)
5.First Cut Approach.
6.Exploratory Data Analysis(EDA).
7.Model Building And Outputs.
8.App Demonstration.
9.Future Work.
10.Project Source Code.
11.References.

Problem Overview
1.In e-commerce, combinations of tiny, nuanced details of the product can build a massive difference in increasing the interest of a user to purchase a product or services.
2.Following details mentioned below can make a big difference in developing interest if the user got a good glance of the product.
avito 3.The above examples tells us how the seller can optimize product listings in an e-commerce website.
4.But what happens even if the seller has fully optimized listing of his product and not generating any number of sales.
5.This leads to a problem in analyzing the demand of the product that the seller wants to sell.
6.This is very important because if the seller invests his money in advertising and people don’t visit his product or even after visiting.
7.They are not interested in buying his desired product, it clearly explains that there is some sort problem in seller’s product.
8.E-commerce giants such as Amazon, Flipkart, Myntra, Walmart, Ebay, Rakuten and AliExpress spends millions of dollars in advertising.
9.If the demand of the products does not exist, it leads to a terrific loss to the company or even a seller who spends tons of his own money in advertising his product.
10.If his product demands does not exist simply frustrating the seller can lead to a big business problem.
11.In April 2018 Avito, A Russian e-commerce giant, opened up a competition in kaggle which was based on the predictions of demands that any goods has or not.
12.Avito is a Russian classified advertisements website with sections devoted to general goods for sale, jobs, real estate, personals, cars for sale, and services.
13.Avito is the most popular classifieds site in Russia and is the second biggest classifieds site in the world after Craigslist.
14.The dataset provided for this case study has been created by the Avito’s team itself.
15.It has various categorical features such as ad id, ad title, ad description, ad image, item_id, user_id, etc with deal_probablity as the target variable.
16.Here the deal probability is the continuous variable which ranges from 0 to 1.
17.Zeros indicate the least probabilities of the item to be purchased and 1 indicates the highest probabilities of the item to be purchased.
18.So this problem is the Regression problem in machine learning.

Dataset Source And Description
Dataset Source
For this problem statement, we are using ‘train.csv’, ‘test.csv’ and zipped train and test images data for predicting deal probability of ‘AVITO’ ads.

Metrics To Use
In the current problem statement, Given the numerical, categorical, image data of a given product, Predict the demand probability value of that product.
This is a regression problem where we need to reduce the Root Mean Square Error(RMSE).
avito Research Papers
(Solutions
Architectures
Kernels.)


1. Approach No 1: Machine Learning Algorithm
Catboost is an open source machine learning library and alogrithm which uses gradient boosting on Decision Trees.
The list of all categorical features has to be passed in cat_features parameter and the rest of the data encoding, modeling will be done handled by Catboost.
It outperforms the Gradient Boosting Algorithm. It also utilizes the GPU by providing task_type parameters.
This algorithm can be tested on Avito dataset to find the performance and also performance can be validated accordingly.

2. Approach No 2: Using Neural Networks(Deep Learning Algorithm).
A Neural Networks is a method in artificial intelligence that teaches computers to process data in a way that is inspired by the human brain. It is a type of machine learning process, called deep learning, that uses interconnected nodes or neurons in a layered structure that resembles the human brain.

Artificial Neural Networks-ANN Or Simulated Neural Networks-SNNs are a subset of machine learning and are at the heart of deep learning algorithms. Their name and structure are inspired by the human brain, mimicking the way that biological neurons signal to one another Using algorithms, they can recognize hidden patterns and correlations in raw data, cluster and classify it, and can continuously learn and improve.

Artificial Neural Networks (ANNs) are comprised of a node layers, containing an input layer, one or more than one hidden layers, and an output layer. Each node, or artificial neuron, connects to another and has an associated weight and threshold.

If the output of any individual node is above the specified threshold value, that node is activated, sending data to the next layer of the network.
Otherwise, no data is passed along to the next layer of the network.
avito 3. Approach No 3 : Using Word Embeddings and RNN(Recurrent Neural Networks)


4. Approach No 4: Using Blur Detection with OpenCV
We can identify blurriness in an image using OpenCV and Laplace operator.
In order to identify blurriness in an image, OpenCV and Laplace operator are used.
In order to find the blurriness in an image, below steps are performed:

1.Find Fast Fourier Transform of the image.
2.Note down the distribution of High frequency as well as Low Frequency.
3.If there are low amount of High frequencies then those images are marked as blurred images.

But if this approach is used, then frequency varies for different types of object.
For one object the value for “low amount of high frequency” will be different for other objects.
One such solution is the variation of the Laplace method. by Pech-Pacheco et al. in their 2000 ICPR paper.
Perform the below steps to get the solution:

a.Consider a single image channel.
b.Convolute it in a with 3 * 3 Kernel.
c.Calculate variance.
d.If the variance falls below a threshold, then it is considered as blurry.

All of the above steps can be performed in OpenCV with single line of code.
The implementation is given below:
cv2.Laplacian(image, cv2.CV_64F).var()

So, the main idea here is to include blurriness factor of the images of the ADS as one of the features to determine the demand of ad.

First Cut Approach
The problem can be solved by the following approach:
1.Perform EDA to uncover to check out how each and every features are distributed or do they follow a specific pattern.
2.Find the missing values in the dataset and impute those missing values if possible.
3.Perform feature engineering by extracting additional features from our given features and consider them as input for the prediction.
4.Perform train-validation splitting and encode all the numerical as well as categorical features.
5.Perform training on all Possible Deep Neural Models and Select the model that gives us the best performance values.

Exploratory Data Analysis(EDA)

Generating WordCloud
Model Building And Outputs.
1.First of all, the csv dataset is loaded as a pandas dataframe.
2.The features that are present in train dataset has the following missing values as given in the snapshot below
avito As given in the snapshot, ad parameter 1,2 and 3 has missing values.
Price feature has missing values. Some images and their classification code are missing because some ads do not have images while advertisement.
Also, some user descriptions are missing.
In order to impute the missing values, refer the below Snapshot.
avito avito

1.ANN with no Image Features.


2.ANN with Image Features.


3.ANN with Image Features with GRU Embedding.


4.ANN with Image Features with LSTM Embedding.


Final Observations
After Training all the models, we get the best losses and metrics as given in the below snapshot

avito App Demonstration
You can view the app demonstration of the app just by clicking here

Future Work


Project Source Code
You can access the complete source code at GitHub Repo

References
Analytics Vidhya Kaggle Notebook GeeksForGeeks Kaggle Notebook

Deep Learning Machine Learning Model: Avito Ad Demand Prediction Case Study 2

Note: For Downloading on Mobile Phones, just click on the 'Open' Button. The pdf file will be automatically downloaded. You just have to wait for a few seconds to see the download option available on your mobile screen.