A Machine-Learning Item Recommendation
System for Video Games
Paul Bertens,
Anna Guitart,
Pei Pei Chen and
´
Africa Peri
´
a
˜
nez
Yokozuna Data, Silicon Studio
1-21-3 Ebisu Shibuya-ku, Tokyo, Japan
{paul, anna, peipei, africa}@yokozunadata.com
Abstract—Video-game players generate huge amounts of data,
as everything they do within a game is recorded. In particular,
among all the stored actions and behaviors, there is information
on the in-game purchases of virtual products. Such information is
of critical importance in modern free-to-play titles, where gamers
can select or buy a profusion of items during the game in order
to progress and fully enjoy their experience. To try to maximize
these kind of purchases, one can use a recommendation system
so as to present players with items that might be interesting for
them. Such systems can better achieve their goal by employing
machine learning algorithms that are able to predict the rating of
an item or product by a particular user. In this paper we evaluate
and compare two of these algorithms, an ensemble-based model
(extremely randomized trees) and a deep neural network, both
of which are promising candidates for operational video-game
recommender engines. Item recommenders can help developers
improve the game. But, more importantly, it should be possible
to integrate them into the game, so that users automatically
get personalized recommendations while playing. The presented
models are not only able to meet this challenge, providing
accurate predictions of the items that a particular player will
find attractive, but also sufficiently fast and robust to be used in
operational settings.
Index Terms—recommender systems, ensemble methods, deep
learning, online games, user behavior
I. INTRODUCTION
The aim of a recommender system is to provide suggestions
to a set of users on items that might be interesting for them.
Recommendation systems are commonly found in e-commerce
[20], [18] (where users purchase goods like books, clothes or
games online), usually implemented through collaborative fil-
tering methods [5]. These work by comparing similar items or
similar users based on user ratings. If two users like the same
items they are likely similar, and if two items are liked by the
same users, those items are probably similar as well. However,
as this method does not take into account the contents, new
items cannot be recommended. Content-based recommenders
can be used to overcome some of these issues by looking
at the item in question and finding similarity between items
based on inherit properties [24]. A hybrid approach can also
be taken, to combine e.g. collaborative information, content
features and demographics [11]. A more detailed study into the
current limitations and possible extensions of recommendation
systems can be found in [1].
These two authors contributed equally to the work.
The integration of recommendation systems into video
games is a relatively new area of research. Previous work
has mostly focused on game recommendation engines, which
present players with suggestions on alternative titles based
on the games they have already played [2], [22]. But it
is also possible to use recommendation systems to increase
player engagement in a game. In modern free-to-play games,
users can buy a wide range of virtual items with real money
(in-app purchases, IAPs). However, sometimes they can be
overwhelmed by the number of items offered and the diversity
of playstyles, and this can lead to an increase in the churn
rate—as players start to find the contents too difficult and
are unable to progress within the game. Item recommendation
systems can help prevent this problem by offering players a
more direct route to the items that could be appealing or useful
for them, thereby improving their purchasing and general in-
game experience. This may ultimately result into increased
revenue [17] by increasing player retention, IAPs and the
conversion rate from free to paying users.
To achieve these goals, it is essential to recommend each
player the right item—one that fits both their current state
and their playing behavior—at the right time. And this is
possible because (in contrast to other applications where very
limited information is available) every action performed by a
player within the game gets recorded. This offers a unique
opportunity not only to obtain accurate predictions on the
player’s in-game behaviour (for example on when and at what
level they will leave the game, see [19] and [4]) but also to
offer them personalized recommendations of items that are
likely relevant to them.
There are previous papers related to item recommenda-
tion systems. [23] introduces a recommendation system for
the massively multiplayer online first-person shooter game
Destiny, where players get suggestions on those items that
best fit their play style and might improve their performance.
They apply similarity measures to global descriptors like total
kill count or kill/death ratio. Clusters for the player “base”
and “cooldown” stats were derived through k-means clus-
tering, whereas archetypal analysis [7], [21] (which clusters
by extreme values rather than centroids [3]) was used to
find distinct playstyles. Similar analyses were done for the
massively multiplayer online role-playing game Tera and the
multiplayer strategy game Battlefield 2: Bad Company 2 [9]
or the game Tomb Raider: Underworld [8]. In all these cases,
arXiv:1806.04900v2 [stat.ML] 14 Aug 2018
players were clustered by their playing behaviour; although
no recommendation system was built, behavioral profiling via
clustering may be very useful in offering recommendations
based on similarity between users.
However, unsupervised clustering methods remains a chal-
lenge. In particular, a significant amount of game-specific
knowledge, is required to find adequate features that can
separate players into the right number of clusters.
A. Aim
While there are several approaches to the problem of
developing recommendation systems, here we will explore a
different avenue: our aim is to provide a method that predicts
the next items a player will purchase, and use this information
to recommend them other items. This approach differs from
traditional methods as we explicitly use a predictive model.
Such a model allows us to predict, both for new and existing
users, the items they are likely to find most appealing based
on their playing behaviour. Additionally, it must be robust for
operational implementation, to be able to recommend game
products automatically, in a variety of game genres, namely
different game data distributions.
II. BACKGROUND
A. Extremely Randomized Trees
Extremely randomized trees (ERTs) [10] extend the ran-
domization of original random forest [13], [6] algorithms by
choosing the splitting points randomly instead of computing
the ones that are more correlated with the output (which
makes random forest an easy biased approach). ERTs are
computationally efficient, reducing the variance of the model
and preventing overfitting. However the bias can also be larger
with this method when the randomization is increased above
the optimal level, due to the decrease in the variance.
Breiman implementation of random forest builds an en-
semble of decision trees, each of which is fit on a random
subset of features [6]. This randomization in the feature
selection, combined with the bagging of multiple decision
trees, reduces the correlation between trees and increases the
overall accuracy of the ensemble.
One of the main advantages of ensemble models is that they
are trivially parallelizable, either using multicore processors
(as each tree could potentially be trained on a single core) or
across multiple machines. This makes them more practical in
operational settings, where training and inference have to be
completed in a relatively short time, and thus better suited for
developing a commercial recommendation system.
B. Deep Neural Networks
Deep neural networks (DNNs) [16] are artificial neural
networks with multiple hidden layers. By using nonlinear
activation functions (the functions that transform the output
at each layer before passing it to the next), DNNs are able
to learn highly nonlinear dynamics. Multiple iterations, i.e.
epochs, are run to optimize the DNN during the learning
process. Rectified linear units (ReLU) are among the most
commonly used activation functions nowadays. DNNs that
combine ReLU with dropout—a strategy consisting in ran-
domly dropping out some of the units at each layer—have
been shown to provide state-of-the-art accuracy in domains
such as image classification [15] or speech recognition [12].
Additionally, for sequential data, recurrent neural networks
(RNN) or long short-term memory (LSTM) networks [14]
have achieved similarly high accuracies in sequence prediction
and language modeling.
III. ITEM RECOMMENDATION MODEL
While RNNs and LSTM networks are able to learn tem-
poral dependencies and eliminate the need for manual feature
engineering, they also slow down the training significantly, as
they have to learn the relevant features of the time series that
lead to an increase in prediction accuracy. On the other hand,
by manually calculating general statistics of the time-series
data together with other descriptors one can efficiently create
a single vector describing the player’s behavior and use it in
nontemporal models like DNNs or ensemble-based methods
such as ERTs.
These are the main challenges related to our approach:
The model should be able to train and provide inference
in production environments scaling to millions of users.
It should be trainable on mini-batches so that it fits in
the memory (ensemble models usually work on the full
dataset).
The time-series data needs to be converted into a single
feature vector that accurately represents the player’s be-
havioural patterns (as commented above, tree ensembles
and DNNs use static feature vectors, not time series).
As players make multiple purchases over their lifetime in
the game, we must extract their next purchase from mul-
tiple time points. Thus the training dataset may become
huge if e.g. players remain in the game for several years.
The following sections elaborate on the dataset used and on
the way the model was constructed to solve these challenges.
A. Dataset
The data used in our analysis comes from the Japanese
card-game Age Of Ishtaria, developed by Silicon Studio, and
contains daily time-series data for each paying user within
the period from 2014-09-24 to 2017-05-08 (totaling 33,488
players). It contains information on the number of purchases
per item and total sales per item for each user. Players can
purchase in-game currency with real money and use it to buy
different card-packs (known as gacha) containing a random set
of cards that can be employed in the game. The data contains
8 different types of items and also has information on e.g. the
player’s daily level progression, playtime and lifetime.
B. Feature calculation
To convert our time-series into a single static vector we
calculate general statistics over the full time-series data for
each of the temporal features (e.g. daily playtime or sales).
The process is as follows: First we compute the derivative
of the time series in order to get its variations (for in-
stance, if we are tracking total level, the derivative gives
us the number of level-ups per day). Then we calculate the
mean/variance/skew/kurtosis/maximum over the time series
for each of the temporal features. Additionally, to capture
behavioral changes of the player between the beginning and
end of their lifetime, we also compute the distance for all
temporal features over the first and last days in which they
logged in. Finally, all these features get concatenated into
our final feature vector. By using such a method, the feature
calculation can be generalized to any type of temporal data.
C. Sampling to handle multi-label outputs
Players usually make multiple purchases, which means we
can have multiple prediction targets (multiple labels) per user.
One way of dealing with this is taking some subsample until
time t from each player’s time series and then find their next
purchase after t. This results in a single label we can train on,
and allows us to take multiple subsamples to enlarge our train-
ing set. Since players could be playing for several years and
have hundreds or even thousands of days of playing activity, by
using subsampling we can generate different training samples
for each player, increasing our effective training dataset and
reducing overfitting.
D. Scalability using minibatches
Additionally, the model should be able to scale to millions
of players; however, if we generate very large feature vectors
(with thousands of features) and sample multiple labels per
user, we could end up with datasets with over a billion samples
(a thousand samples per user). An efficient way of coping
with such huge data sets is to train an ensemble model on
subsamples of the total set. Hence, we can train a small subset
of trees (
20) on a small sample of a few thousand users and
generate the labels directly during training, so that we do not
need to store all samples. The final ensemble is formed by
combining many such subsets of trees, where each tree was
trained on different features, different samples, and different
target labels, producing an extremely robust model.
E. Model Specification
1) Output: For each player and item, we generate the
probability that they will buy that item on their next purchase
day. As the model is trained over all players, once players are
in a similar state the model can learn to predict and recommend
the right item at the right time for each individual player.
2) Input: We take the full time-series patterns for each
user to convert them into a single vector that represents their
playing behavior. This conversion is done for all users in a
single mini-batch. Multiple mini-batches are generated per
epoch (one epoch goes over the entire dataset), and the model
is trained on each of these batches.
3) Parameters: The ERT model was trained on subsets of
20 trees for 30 iterations, resulting in a total ensemble size of
600 trees. Each iteration was performed on a subset of
10k
users, which means that a full single epoch was completed
after 3 iterations (as the total set has 33,488 players), therefore
we had 10 epochs.
For the DNN model, we used two hidden layers of 2048
units and set a dropout probability of 0.5. Additionally, as
there were many correlating features, dropout was also applied
to the input layer. By randomly dropping some inputs, we
reduce overfitting on single features, thereby increasing the
robustness of the model. (Recall this was achieved by random
subsampling of features in the ERT model.) The network was
trained for 30 iterations as well, but each iteration was repeated
5 times, resulting in a total of 50 epochs. Both DNN and ERT
are trained on the same data.
IV. MODEL EVALUATION
In order to evaluate the effectiveness of the proposed model,
we study the prediction accuracy within an upcoming time
window. Predictions are made at a time point t and evaluated
at time t + 50 (where t is measured in days). The training was
performed using data up to 2017-03-19, and predictions were
verified in the window from 2017-03-20 to 2017-05-08.
Several measures are calculated:
1) isOnNextPurchaseDate: Checks whether the predicted
item was actually acquired by the player throughout their next
purchase day (our training objective).
2) isNextPurchase: Checks whether the item that was
predicted to be purchased by a certain player was actually
acquired by the player on their very next purchase.
3) isWithinWindow: Checks whether the predicted item was
actually acquired by the player at some point within the time
window considered (between t + 1 and t + 50).
For all three measures, the accuracy for the top (predict-
edMax), top 2 (withinTop2) and top 3 (withinTop3) predicted
items is calculated, i.e. we check whether the player actually
purchased the item that had the highest probability, any of the
two items with the two highest probabilities or any of the three
items with the three highest probabilities, as per the prediction.
V. RESULTS
Fig. 1. Predicted probability, for a sample of players and a series of items,
that the item will be bought by the player on their next purchase, using
the DNN (left) and ERT models (right). (Darker colors correspond to higher
probabilities.)
Figure 1 shows the predictions for a subset of users. The
DNN (left panel) and ERT (right panel) results exhibit similar
patterns (with only slight variations). We see that different
users have different purchase probabilities for each item, which
shows that the models are capable of providing personalized
predictions for each player based on their playing behaviour.
The accuracy results for both models can be found in Table
I. When considering the top 2 and top 3 predictions, both
models present similar accuracies, but the ERT is slightly
better at identifying the item with the highest probability of
being acquired on the next purchase, for all three measures.
TABLE I
ACCURACY RESULTS FOR THE NEXT-PURCHASE
PREDICTION IN THE DNN AND ERT MODELS
(DNN) predictedMax withinTop2 withinTop3
isOnNextPurchaseDate 44% 68% 81%
isNextPurchase 34% 59% 74%
isWithinWindow 69% 85% 90%
(ERT) predictedMax withinTop2 withinTop3
isOnNextPurchaseDate 47% 68% 81%
isNextPurchase 37% 59% 74%
isWithinWindow 71% 85% 91%
VI. DISCUSSION
An item recommendation system for games is essential
to provide players with individual rewards or incentives to
increase engagement, to maximize in-app purchases and to
increase cross-selling and up-selling. We have presented two
models to predict which items players will be more attracted to
buy in their next purchases. The results show that the predict-
ing performance of the DNN and ERT is similar. However the
ERT model yields slightly better results (as shown in Table I)
and also scales up more easily in a production environment.
While predictions were made only for a small set of items,
the model is trivially extendable to run on hundreds of items,
and can be used both for items purchased with real money and
for in-game virtual purchases. Future works in this direction
will include an evaluation of the recommendation system in
terms of total game sales for live video-games.
ACKNOWLEDGEMENTS
We thank Javier Grande for his careful review of the
manuscript and Ana Fern
´
andez for her support.
REFERENCES
[1] G. Adomavicius and A. Tuzhilin. Toward the next generation of
recommender systems: A survey of the state-of-the-art and possible
extensions. IEEE Transactions on Knowledge and Data Engineering,
17(6):734–749, 2005.
[2] S. M. Anwar, T. Shahzad, Z. Sattar, R. Khan, and M. Majid. A
game recommender system using collaborative filtering (GAMBIT). In
2017 14th International Bhurban Conference on Applied Sciences and
Technology (IBCAST), pages 328–332. IEEE, 2017.
[3] C. Bauckhage and R. Sifa. k-maxoids clustering. In Proceedings of the
LWA 2015 Workshops: KDML, FGWM, IR, and FGDB, pages 133–144,
2015.
[4] P. Bertens, A. Guitart, and
´
A. Peri
´
a
˜
nez. Games and big data: A scalable
multi-dimensional churn prediction model. In 2017 IEEE Conference
on Computational Intelligence and Games (CIG), pages 33–36. IEEE,
2017.
[5] J. S. Breese, D. Heckerman, and C. Kadie. Empirical analysis of
predictive algorithms for collaborative filtering. In Proceedings of the
Fourteenth Conference on Uncertainty in Artificial Intelligence, pages
43–52, San Francisco, 1998. Morgan Kaufman.
[6] L. Breiman. Random forests. Machine learning, 45(1):5–32, 2001.
[7] A. Cutler and L. Breiman. Archetypal analysis. Technometrics,
36(4):338–347, 1994.
[8] A. Drachen, A. Canossa, and G. N. Yannakakis. Player modeling
using self-organization in Tomb Raider: Underworld. In 2009 IEEE
Symposium on Computational Intelligence and Games (CIG), pages 1–
8. IEEE, 2009.
[9] A. Drachen, R. Sifa, C. Bauckhage, and C. Thurau. Guns, swords and
data: Clustering of player behavior in computer games in the wild. In
2012 IEEE Conference on Computational Intelligence and Games (CIG),
pages 163–170. IEEE, 2012.
[10] P. Geurts, D. Ernst, and L. Wehenkel. Extremely randomized trees.
Machine learning, 63(1):3–42, 2006.
[11] C. A. Gomez-Uribe and N. Hunt. The Netflix recommender system:
Algorithms, business value, and innovation. ACM Transactions on
Management Information Systems, 6(4):13, 2016.
[12] G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed, N. Jaitly,
A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kingsbury.
Deep neural networks for acoustic modeling in speech recognition: The
shared views of four research groups. IEEE Signal Processing Magazine,
29(6):82–97, 2012.
[13] T. K. Ho. The random subspace method for constructing decision
forests. IEEE Transactions on Pattern Analysis and Machine Intelli-
gence, 20(8):832–844, 1998.
[14] S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural
computation, 9(8):1735–1780, 1997.
[15] A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet classification
with deep convolutional neural networks. In Advances in Neural
Information Processing Systems 25 (NIPS 2012), pages 1097–1105,
2012.
[16] Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature,
521(7553):436, 2015.
[17] V. Lehdonvirta. Virtual item sales as a revenue model: identifying
attributes that drive purchase decisions. Electronic Commerce Research,
9(1–2):97–113, 2009.
[18] G. Linden, B. Smith, and J. York. Amazon.com recommendations: Item-
to-item collaborative filtering. IEEE Internet Computing, 7(1):76–80,
2003.
[19]
´
A. Peri
´
a
˜
nez, A. Saas, A. Guitart, and C. Magne. Churn prediction
in mobile social games: towards a complete assessment using survival
ensembles. In 2016 IEEE International Conference on Data Science
and Advanced Analytics (DSAA), pages 564–573. IEEE, 2016.
[20] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Analysis of recom-
mendation algorithms for e-commerce. In Proceedings of the 2nd ACM
Conference on Electronic Commerce, pages 158–167. ACM, 2000.
[21] R. Sifa, C. Bauckhage, and A. Drachen. Archetypal game recommender
systems. In Proceedings of the 16th LWA Workshops: KDML, IR and
FGWM, pages 45–56, 2014.
[22] R. Sifa, A. Drachen, and C. Bauckhage. Large-scale cross-game player
behavior analysis on steam. In Proceedings of the Eleventh AAAI Con-
ference on Artificial Intelligence and Interactive Digital Entertainment
(AIIDE-15), pages 198–204, 2015.
[23] R. Sifa, E. Pawlakos, K. Zhai, S. Haran, R. Jha, D. Klabjan, and
A. Drachen. Controlling the crucible: A novel PvP recommender
systems framework for Destiny. In Proceedings of the Australasian
Computer Science Week Multiconference, ACSW 2018, 2018.
[24] A. Van den Oord, S. Dieleman, and B. Schrauwen. Deep content-based
music recommendation. In Advances in neural information processing
systems 26 (NIPS 2013), pages 2643–2651, 2013.