Player Unknown's Battle Grounds (PUBG) Game Data Analysis

An analysis of Player Unknown's Battle Grounds (PUBG) Game Data using Hive, Impala and Spark. Presentation for the report can be found in project presentation.

General info

The goal of the study was to learn data analysis using various big data tools. I am a great fan of PUBG mobile game and the game developer had recently released the dataset on Kaggle. So I decided to take it up for the project, that way I also have an analytics edge when playing the game.

Screenshots

Technologies and Tools

Hive
Spark
Impala

Setup

The data for the analysis has been sourced from Kaggle. All the codes used in the analysis can be accessed here and can be used to reproduce the result. A detailed explanation of the various operations and the interpretations of the outputs can be found in the project report.

Code Examples

#Correlation in Hive
set hive.cli.print.header=true;
select corr(weaponsacquired,winplaceperc) from pubg_new where match_type1='solo';
select corr(weaponsacquired,winplaceperc) from pubg_new where match_type1='Duo';
select corr(weaponsacquired,winplaceperc) from pubg_new where match_type1='Squad'

#Linear Regression in Pyspark
from pyspark import SparkContext, SparkConf
from pyspark.sql import SparkSession, HiveContext
hive_context=HiveContext(sc)
pubg=hive_context.table("pubg_new")

pubg=pubg.select('boosts', 'damagedealt', 'dbnos', 'headshotkills', 'heals', 'killplace', 'killpoints', 'kills', 'killstreaks', 'longestkill', 'maxplace', 'numgroups', 'revives', 'ridedistance', 'roadkills', 'swimdistance', 'teamkills', 'vehicledestroys', 'walkdistance', 'weaponsacquired', 'winpoints', 'winplaceperc')
pubg.show(10) 
pubg.printSchema()
pubg.cache()

from pyspark.ml.feature import VectorAssembler
vectorAssembler = VectorAssembler(inputCols = ['boosts', 'damagedealt', 'dbnos', 'headshotkills', 'heals', 'killplace', 'killpoints', 'kills', 'killstreaks', 'longestkill', 'maxplace', 'numgroups', 'revives', 'ridedistance', 'roadkills', 'swimdistance', 'teamkills', 'vehicledestroys', 'walkdistance', 'weaponsacquired', 'winpoints', 'winplaceperc'], outputCol = 'features')
pubg_df=vectorAssembler.transform(pubg)
pubg_df = pubg_df.select(['features', 'winplaceperc'])
pubg_df.show(3)

splits = pubg_df.randomSplit([0.7, 0.3])
train_df = splits[0]
test_df = splits[1]

from pyspark.ml.regression import LinearRegression
lr = LinearRegression(featuresCol = 'features', labelCol='winplaceperc', maxIter=10, regParam=0.3, elasticNetParam=0.8)
lr_model = lr.fit(train_df)

print("Coefficients: " + str(lr_model.coefficients))
print("Intercept: " + str(lr_model.intercept))
trainingSummary = lr_model.summary
print("RMSE: %f" % trainingSummary.rootMeanSquaredError)
print("r2: %f" % trainingSummary.r2)

pubg_df.describe().show()
lr_predictions = lr_model.transform(test_df)
lr_predictions.select("prediction","winplaceperc","features").show(10)

from pyspark.ml.evaluation import RegressionEvaluator
lr_evaluator = RegressionEvaluator(predictionCol="prediction", \
                 labelCol="winplaceperc",metricName="r2")
print("R Squared (R2) on test data = %g" % lr_evaluator.evaluate(lr_predictions))

test_result = lr_model.evaluate(test_df)
print("Root Mean Squared Error (RMSE) on test data = %g" % test_result.rootMeanSquaredError)
print("numIterations: %d" % trainingSummary.totalIterations)
print("objectiveHistory: %s" % str(trainingSummary.objectiveHistory))
trainingSummary.residuals.show()

predictions = lr_model.transform(test_df)
predictions.select("prediction","winplaceperc","features").show()

Features

The analysis is focussed on answering questions stated below:

Does killing more people increases the chance of winning the game?
Can we predict the finishing position of a player in the game?
How do we catch the cheaters in the game?

The answers to the questions can be found in the report.

Contact

If you loved what you read here and feel like we can collaborate to produce some exciting stuff, or if you just want to shoot a question, please feel free to connect with me on email or LinkedIn

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
Data		Data
code		code
img		img
Big Data Project .pdf		Big Data Project .pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Player Unknown's Battle Grounds (PUBG) Game Data Analysis

Table of contents

General info

Screenshots

Technologies and Tools

Setup

Code Examples

Features

Contact

About

Releases

Packages

Languages

ashish1993utd/PUBG-Game-Data-Analysis

Folders and files

Latest commit

History

Repository files navigation

Player Unknown's Battle Grounds (PUBG) Game Data Analysis

Table of contents

General info

Screenshots

Technologies and Tools

Setup

Code Examples

Features

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages