A Machine-Learning Item Recommendation System for Video Games

A Machine-Learning Item Recommendation

System for Video Games

Paul Bertens,

†

Anna Guitart,

†

Pei Pei Chen and

Africa Peri

nez

Yokozuna Data, Silicon Studio

1-21-3 Ebisu Shibuya-ku, Tokyo, Japan

{paul, anna, peipei, africa}@yokozunadata.com

Abstract—Video-game players generate huge amounts of data,

as everything they do within a game is recorded. In particular,

among all the stored actions and behaviors, there is information

on the in-game purchases of virtual products. Such information is

of critical importance in modern free-to-play titles, where gamers

can select or buy a profusion of items during the game in order

to progress and fully enjoy their experience. To try to maximize

these kind of purchases, one can use a recommendation system

so as to present players with items that might be interesting for

them. Such systems can better achieve their goal by employing

machine learning algorithms that are able to predict the rating of

an item or product by a particular user. In this paper we evaluate

and compare two of these algorithms, an ensemble-based model

(extremely randomized trees) and a deep neural network, both

of which are promising candidates for operational video-game

recommender engines. Item recommenders can help developers

improve the game. But, more importantly, it should be possible

to integrate them into the game, so that users automatically

get personalized recommendations while playing. The presented

models are not only able to meet this challenge, providing

accurate predictions of the items that a particular player will

ﬁnd attractive, but also sufﬁciently fast and robust to be used in

operational settings.

Index Terms—recommender systems, ensemble methods, deep

learning, online games, user behavior

I. INTRODUCTION

The aim of a recommender system is to provide suggestions

to a set of users on items that might be interesting for them.

Recommendation systems are commonly found in e-commerce

[20], [18] (where users purchase goods like books, clothes or

games online), usually implemented through collaborative ﬁl-

tering methods [5]. These work by comparing similar items or

similar users based on user ratings. If two users like the same

items they are likely similar, and if two items are liked by the

same users, those items are probably similar as well. However,

as this method does not take into account the contents, new

items cannot be recommended. Content-based recommenders

can be used to overcome some of these issues by looking

at the item in question and ﬁnding similarity between items

based on inherit properties [24]. A hybrid approach can also

be taken, to combine e.g. collaborative information, content

features and demographics [11]. A more detailed study into the

current limitations and possible extensions of recommendation

systems can be found in [1].

†

These two authors contributed equally to the work.

The integration of recommendation systems into video

games is a relatively new area of research. Previous work

has mostly focused on game recommendation engines, which

present players with suggestions on alternative titles based

on the games they have already played [2], [22]. But it

is also possible to use recommendation systems to increase

player engagement in a game. In modern free-to-play games,

users can buy a wide range of virtual items with real money

(in-app purchases, IAPs). However, sometimes they can be

overwhelmed by the number of items offered and the diversity

of playstyles, and this can lead to an increase in the churn

rate—as players start to ﬁnd the contents too difﬁcult and

are unable to progress within the game. Item recommendation

systems can help prevent this problem by offering players a

more direct route to the items that could be appealing or useful

for them, thereby improving their purchasing and general in-

game experience. This may ultimately result into increased

revenue [17] by increasing player retention, IAPs and the

conversion rate from free to paying users.

To achieve these goals, it is essential to recommend each

player the right item—one that ﬁts both their current state

and their playing behavior—at the right time. And this is

possible because (in contrast to other applications where very

limited information is available) every action performed by a

player within the game gets recorded. This offers a unique

opportunity not only to obtain accurate predictions on the

player’s in-game behaviour (for example on when and at what

level they will leave the game, see [19] and [4]) but also to

offer them personalized recommendations of items that are

likely relevant to them.

There are previous papers related to item recommenda-

tion systems. [23] introduces a recommendation system for

the massively multiplayer online ﬁrst-person shooter game

Destiny, where players get suggestions on those items that

best ﬁt their play style and might improve their performance.

They apply similarity measures to global descriptors like total

kill count or kill/death ratio. Clusters for the player “base”

and “cooldown” stats were derived through k-means clus-

tering, whereas archetypal analysis [7], [21] (which clusters

by extreme values rather than centroids [3]) was used to

ﬁnd distinct playstyles. Similar analyses were done for the

massively multiplayer online role-playing game Tera and the

multiplayer strategy game Battleﬁeld 2: Bad Company 2 [9]

or the game Tomb Raider: Underworld [8]. In all these cases,

arXiv:1806.04900v2 [stat.ML] 14 Aug 2018

players were clustered by their playing behaviour; although

no recommendation system was built, behavioral proﬁling via

clustering may be very useful in offering recommendations

based on similarity between users.

However, unsupervised clustering methods remains a chal-

lenge. In particular, a signiﬁcant amount of game-speciﬁc

knowledge, is required to ﬁnd adequate features that can

separate players into the right number of clusters.

A. Aim

While there are several approaches to the problem of

developing recommendation systems, here we will explore a

different avenue: our aim is to provide a method that predicts

the next items a player will purchase, and use this information

to recommend them other items. This approach differs from

traditional methods as we explicitly use a predictive model.

Such a model allows us to predict, both for new and existing

users, the items they are likely to ﬁnd most appealing based

on their playing behaviour. Additionally, it must be robust for

operational implementation, to be able to recommend game

products automatically, in a variety of game genres, namely

different game data distributions.

II. BACKGROUND

A. Extremely Randomized Trees

Extremely randomized trees (ERTs) [10] extend the ran-

domization of original random forest [13], [6] algorithms by

choosing the splitting points randomly instead of computing

the ones that are more correlated with the output (which

makes random forest an easy biased approach). ERTs are

computationally efﬁcient, reducing the variance of the model

and preventing overﬁtting. However the bias can also be larger

with this method when the randomization is increased above

the optimal level, due to the decrease in the variance.

Breiman implementation of random forest builds an en-

semble of decision trees, each of which is ﬁt on a random

subset of features [6]. This randomization in the feature

selection, combined with the bagging of multiple decision

trees, reduces the correlation between trees and increases the

overall accuracy of the ensemble.

One of the main advantages of ensemble models is that they

are trivially parallelizable, either using multicore processors

(as each tree could potentially be trained on a single core) or

across multiple machines. This makes them more practical in

operational settings, where training and inference have to be

completed in a relatively short time, and thus better suited for

developing a commercial recommendation system.

B. Deep Neural Networks

Deep neural networks (DNNs) [16] are artiﬁcial neural

networks with multiple hidden layers. By using nonlinear

activation functions (the functions that transform the output

at each layer before passing it to the next), DNNs are able

to learn highly nonlinear dynamics. Multiple iterations, i.e.

epochs, are run to optimize the DNN during the learning

process. Rectiﬁed linear units (ReLU) are among the most

commonly used activation functions nowadays. DNNs that

combine ReLU with dropout—a strategy consisting in ran-

domly dropping out some of the units at each layer—have

been shown to provide state-of-the-art accuracy in domains

such as image classiﬁcation [15] or speech recognition [12].

Additionally, for sequential data, recurrent neural networks

(RNN) or long short-term memory (LSTM) networks [14]

have achieved similarly high accuracies in sequence prediction

and language modeling.

III. ITEM RECOMMENDATION MODEL

While RNNs and LSTM networks are able to learn tem-

poral dependencies and eliminate the need for manual feature

engineering, they also slow down the training signiﬁcantly, as

they have to learn the relevant features of the time series that

lead to an increase in prediction accuracy. On the other hand,

by manually calculating general statistics of the time-series

data together with other descriptors one can efﬁciently create

a single vector describing the player’s behavior and use it in

nontemporal models like DNNs or ensemble-based methods

such as ERTs.

These are the main challenges related to our approach:

• The model should be able to train and provide inference

in production environments scaling to millions of users.

• It should be trainable on mini-batches so that it ﬁts in

the memory (ensemble models usually work on the full

dataset).

• The time-series data needs to be converted into a single

feature vector that accurately represents the player’s be-

havioural patterns (as commented above, tree ensembles

and DNNs use static feature vectors, not time series).

• As players make multiple purchases over their lifetime in

the game, we must extract their next purchase from mul-

tiple time points. Thus the training dataset may become

huge if e.g. players remain in the game for several years.

The following sections elaborate on the dataset used and on

the way the model was constructed to solve these challenges.

A. Dataset

The data used in our analysis comes from the Japanese

card-game Age Of Ishtaria, developed by Silicon Studio, and

contains daily time-series data for each paying user within

the period from 2014-09-24 to 2017-05-08 (totaling 33,488

players). It contains information on the number of purchases

per item and total sales per item for each user. Players can

purchase in-game currency with real money and use it to buy

different card-packs (known as gacha) containing a random set

of cards that can be employed in the game. The data contains

8 different types of items and also has information on e.g. the

player’s daily level progression, playtime and lifetime.

B. Feature calculation

To convert our time-series into a single static vector we

calculate general statistics over the full time-series data for

each of the temporal features (e.g. daily playtime or sales).

The process is as follows: First we compute the derivative

of the time series in order to get its variations (for in-

stance, if we are tracking total level, the derivative gives

us the number of level-ups per day). Then we calculate the

mean/variance/skew/kurtosis/maximum over the time series

for each of the temporal features. Additionally, to capture

behavioral changes of the player between the beginning and

end of their lifetime, we also compute the distance for all

temporal features over the ﬁrst and last days in which they

logged in. Finally, all these features get concatenated into

our ﬁnal feature vector. By using such a method, the feature

calculation can be generalized to any type of temporal data.

C. Sampling to handle multi-label outputs

Players usually make multiple purchases, which means we

can have multiple prediction targets (multiple labels) per user.

One way of dealing with this is taking some subsample until

time t from each player’s time series and then ﬁnd their next

purchase after t. This results in a single label we can train on,

and allows us to take multiple subsamples to enlarge our train-

ing set. Since players could be playing for several years and

have hundreds or even thousands of days of playing activity, by

using subsampling we can generate different training samples

for each player, increasing our effective training dataset and

reducing overﬁtting.

D. Scalability using minibatches

Additionally, the model should be able to scale to millions

of players; however, if we generate very large feature vectors

(with thousands of features) and sample multiple labels per

user, we could end up with datasets with over a billion samples

(a thousand samples per user). An efﬁcient way of coping

with such huge data sets is to train an ensemble model on

subsamples of the total set. Hence, we can train a small subset

of trees (

∼

20) on a small sample of a few thousand users and

generate the labels directly during training, so that we do not

need to store all samples. The ﬁnal ensemble is formed by

combining many such subsets of trees, where each tree was

trained on different features, different samples, and different

target labels, producing an extremely robust model.

E. Model Speciﬁcation

1) Output: For each player and item, we generate the

probability that they will buy that item on their next purchase

day. As the model is trained over all players, once players are

in a similar state the model can learn to predict and recommend

the right item at the right time for each individual player.

2) Input: We take the full time-series patterns for each

user to convert them into a single vector that represents their

playing behavior. This conversion is done for all users in a

single mini-batch. Multiple mini-batches are generated per

epoch (one epoch goes over the entire dataset), and the model

is trained on each of these batches.

3) Parameters: The ERT model was trained on subsets of

20 trees for 30 iterations, resulting in a total ensemble size of

600 trees. Each iteration was performed on a subset of

∼

10k

users, which means that a full single epoch was completed

after 3 iterations (as the total set has 33,488 players), therefore

we had 10 epochs.

For the DNN model, we used two hidden layers of 2048

units and set a dropout probability of 0.5. Additionally, as

there were many correlating features, dropout was also applied

to the input layer. By randomly dropping some inputs, we

reduce overﬁtting on single features, thereby increasing the

robustness of the model. (Recall this was achieved by random

subsampling of features in the ERT model.) The network was

trained for 30 iterations as well, but each iteration was repeated

5 times, resulting in a total of 50 epochs. Both DNN and ERT

are trained on the same data.

IV. MODEL EVALUATION

In order to evaluate the effectiveness of the proposed model,

we study the prediction accuracy within an upcoming time

window. Predictions are made at a time point t and evaluated

at time t + 50 (where t is measured in days). The training was

performed using data up to 2017-03-19, and predictions were

veriﬁed in the window from 2017-03-20 to 2017-05-08.

Several measures are calculated:

1) isOnNextPurchaseDate: Checks whether the predicted

item was actually acquired by the player throughout their next

purchase day (our training objective).

2) isNextPurchase: Checks whether the item that was

predicted to be purchased by a certain player was actually

acquired by the player on their very next purchase.

3) isWithinWindow: Checks whether the predicted item was

actually acquired by the player at some point within the time

window considered (between t + 1 and t + 50).

For all three measures, the accuracy for the top (predict-

edMax), top 2 (withinTop2) and top 3 (withinTop3) predicted

items is calculated, i.e. we check whether the player actually

purchased the item that had the highest probability, any of the

two items with the two highest probabilities or any of the three

items with the three highest probabilities, as per the prediction.

V. RESULTS

Fig. 1. Predicted probability, for a sample of players and a series of items,

that the item will be bought by the player on their next purchase, using

the DNN (left) and ERT models (right). (Darker colors correspond to higher

probabilities.)

Figure 1 shows the predictions for a subset of users. The

DNN (left panel) and ERT (right panel) results exhibit similar

patterns (with only slight variations). We see that different

users have different purchase probabilities for each item, which

shows that the models are capable of providing personalized

predictions for each player based on their playing behaviour.

The accuracy results for both models can be found in Table

I. When considering the top 2 and top 3 predictions, both

models present similar accuracies, but the ERT is slightly

better at identifying the item with the highest probability of

being acquired on the next purchase, for all three measures.

TABLE I

ACCURACY RESULTS FOR THE NEXT-PURCHASE

PREDICTION IN THE DNN AND ERT MODELS

(DNN) predictedMax withinTop2 withinTop3

isOnNextPurchaseDate 44% 68% 81%

isNextPurchase 34% 59% 74%

isWithinWindow 69% 85% 90%

(ERT) predictedMax withinTop2 withinTop3

isOnNextPurchaseDate 47% 68% 81%

isNextPurchase 37% 59% 74%

isWithinWindow 71% 85% 91%

VI. DISCUSSION

An item recommendation system for games is essential

to provide players with individual rewards or incentives to

increase engagement, to maximize in-app purchases and to

increase cross-selling and up-selling. We have presented two

models to predict which items players will be more attracted to

buy in their next purchases. The results show that the predict-

ing performance of the DNN and ERT is similar. However the

ERT model yields slightly better results (as shown in Table I)

and also scales up more easily in a production environment.

While predictions were made only for a small set of items,

the model is trivially extendable to run on hundreds of items,

and can be used both for items purchased with real money and

for in-game virtual purchases. Future works in this direction

will include an evaluation of the recommendation system in

terms of total game sales for live video-games.

ACKNOWLEDGEMENTS

We thank Javier Grande for his careful review of the

manuscript and Ana Fern

andez for her support.

REFERENCES

[1] G. Adomavicius and A. Tuzhilin. Toward the next generation of

recommender systems: A survey of the state-of-the-art and possible

extensions. IEEE Transactions on Knowledge and Data Engineering,

17(6):734–749, 2005.

[2] S. M. Anwar, T. Shahzad, Z. Sattar, R. Khan, and M. Majid. A

game recommender system using collaborative ﬁltering (GAMBIT). In

2017 14th International Bhurban Conference on Applied Sciences and

Technology (IBCAST), pages 328–332. IEEE, 2017.

[3] C. Bauckhage and R. Sifa. k-maxoids clustering. In Proceedings of the

LWA 2015 Workshops: KDML, FGWM, IR, and FGDB, pages 133–144,

2015.

[4] P. Bertens, A. Guitart, and

A. Peri

nez. Games and big data: A scalable

multi-dimensional churn prediction model. In 2017 IEEE Conference

on Computational Intelligence and Games (CIG), pages 33–36. IEEE,

2017.

[5] J. S. Breese, D. Heckerman, and C. Kadie. Empirical analysis of

predictive algorithms for collaborative ﬁltering. In Proceedings of the

Fourteenth Conference on Uncertainty in Artiﬁcial Intelligence, pages

43–52, San Francisco, 1998. Morgan Kaufman.

[6] L. Breiman. Random forests. Machine learning, 45(1):5–32, 2001.

[7] A. Cutler and L. Breiman. Archetypal analysis. Technometrics,

36(4):338–347, 1994.

[8] A. Drachen, A. Canossa, and G. N. Yannakakis. Player modeling

using self-organization in Tomb Raider: Underworld. In 2009 IEEE

Symposium on Computational Intelligence and Games (CIG), pages 1–

8. IEEE, 2009.

[9] A. Drachen, R. Sifa, C. Bauckhage, and C. Thurau. Guns, swords and

data: Clustering of player behavior in computer games in the wild. In

2012 IEEE Conference on Computational Intelligence and Games (CIG),

pages 163–170. IEEE, 2012.

[10] P. Geurts, D. Ernst, and L. Wehenkel. Extremely randomized trees.

Machine learning, 63(1):3–42, 2006.

[11] C. A. Gomez-Uribe and N. Hunt. The Netﬂix recommender system:

Algorithms, business value, and innovation. ACM Transactions on

Management Information Systems, 6(4):13, 2016.

[12] G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed, N. Jaitly,

A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kingsbury.

Deep neural networks for acoustic modeling in speech recognition: The

shared views of four research groups. IEEE Signal Processing Magazine,

29(6):82–97, 2012.

[13] T. K. Ho. The random subspace method for constructing decision

forests. IEEE Transactions on Pattern Analysis and Machine Intelli-

gence, 20(8):832–844, 1998.

[14] S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural

computation, 9(8):1735–1780, 1997.

[15] A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet classiﬁcation

with deep convolutional neural networks. In Advances in Neural

Information Processing Systems 25 (NIPS 2012), pages 1097–1105,

2012.

[16] Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature,

521(7553):436, 2015.

[17] V. Lehdonvirta. Virtual item sales as a revenue model: identifying

attributes that drive purchase decisions. Electronic Commerce Research,

9(1–2):97–113, 2009.

[18] G. Linden, B. Smith, and J. York. Amazon.com recommendations: Item-

to-item collaborative ﬁltering. IEEE Internet Computing, 7(1):76–80,

2003.

[19]

A. Peri

nez, A. Saas, A. Guitart, and C. Magne. Churn prediction

in mobile social games: towards a complete assessment using survival

ensembles. In 2016 IEEE International Conference on Data Science

and Advanced Analytics (DSAA), pages 564–573. IEEE, 2016.

[20] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Analysis of recom-

mendation algorithms for e-commerce. In Proceedings of the 2nd ACM

Conference on Electronic Commerce, pages 158–167. ACM, 2000.

[21] R. Sifa, C. Bauckhage, and A. Drachen. Archetypal game recommender

systems. In Proceedings of the 16th LWA Workshops: KDML, IR and

FGWM, pages 45–56, 2014.

[22] R. Sifa, A. Drachen, and C. Bauckhage. Large-scale cross-game player

behavior analysis on steam. In Proceedings of the Eleventh AAAI Con-

ference on Artiﬁcial Intelligence and Interactive Digital Entertainment

(AIIDE-15), pages 198–204, 2015.

[23] R. Sifa, E. Pawlakos, K. Zhai, S. Haran, R. Jha, D. Klabjan, and

A. Drachen. Controlling the crucible: A novel PvP recommender

systems framework for Destiny. In Proceedings of the Australasian

Computer Science Week Multiconference, ACSW 2018, 2018.

[24] A. Van den Oord, S. Dieleman, and B. Schrauwen. Deep content-based

music recommendation. In Advances in neural information processing

systems 26 (NIPS 2013), pages 2643–2651, 2013.