Is it possible to distinguish a group of users similar to the current customers of an online store using only public data from a social network? Advertising only to loyal audience helps reduce the cost per click and, in effect, the cost per customer.
We chose the largest Russian social network vk.com as a source of data for the experiment. Customers are buyers of an online store of one of the MLM companies. To simplify the task, users were selected only from one region.
- Training set contains 27K of user profiles, 2K of them are customers of the online store.
- Test set contains 30K of random user profiles.
For this experiment we decided not to use the following features:
- Relationship between users.
- Publicly available personal data of users.
The algorithm uses only the following features:
- The text of the public posts of the user in the past two years.
- Counts of likes and shares of user’s records.
TF-IDF (TF-term frequency, IDF-inverse document frequency) transformation was applied to convert the text of records to machine readable form. Well-proven XGBoost was chosen as an machine learning algorithm.
Proof of concept
Using the algorithm on the test set 1.5K user profiles were selected as similar to the customers of the online store.
To compare the quality of the algorithm the same size 1.5К random user profiles were randomly selected from the test set.
To test the hypothesis, the method of placing an ad with a cost per thousand impressions was used. Two identical ads with the same cost per thousand impressions were created and each of ads was targeting its audience.
|CTR||CPC (Russian rubles)|
|similar to the customers||0.046%||32|
|random user profiles||0.014%||107|
Algorithm based on machine learning is able to select an audience more loyal to the online store using only publicly available posts. Targeting ads to the audience selected by the algorithm we got 3 times more efficient spending of the advertising budget due to 3 times less cost per click.