This repo is a demo of a classical Data Science and Machine Learning approach but for gaming data, which I've never worked with before. The gaming data is from the games of Counter-Strike Global Offensive (CSGO) and League of Legends (LoL).
For CSGO, the approach is analytical at first producing statistical and probabilistic analysis of the games played. Later on, the coordinate data (longitude, latitude) is used to do movement prediction with an LSTM, training on the GPU.
But for LoL, the approach is a bit different because of the composition of the dataset, first an analysis is done which then leads to a binary classification challenge where the predictions are which teams wins.
What we'll look at first is the equipment value after buy time for each match and split them by which side won.
- Terrorists might be saving depending on the round
Of all the matches analyzed, we find that Counter-Terrorists have a higher propensity to be the first attacker.
Terrorists have a tendency to spread around bomb site while Counter-Terrorists focus more on bomb site B and the center.
When we test the difference of time between attacks between CT and T using a non-parametric statistical test 500 times to see if this happens by chance 1% of the time - the majority of our tests generate values that are compatible with our data. Meaning we don’t really find a statistical difference.
For LoL, at a first glance we find the following:
- More wards placed by the team also comes with a higher team champion level
If a team has an extremely high kill rate they tend be very different in their first 10 minutes of the game.
- higher kill assists
- higher kills
- more gold per minute
- higher total experience
- higher champion level
We've got all the features for modeling
The best model is a logistic regression using our augmented dataset with the PCA embeddings. The second best model is the Random Forest which is a tree-based model.