In any competitive environment with public-facing statistics, you will find individuals who seek to manipulate their statistics.
I’ve begun analyzing the Destiny Trials of Osiris leaderboards to see if I can locate cheaters (Denial of Service attackers, stat farmers) from a bird’s eye view. The dataset I’m using is the guardian.gg leaderboards.
The tools I’m using are Jupyter, Pandas, and Plotly. Ultimately I intend to make this project open-source.
Contents
Background
Since the Trials of Osiris took place for the first time on May 22, 2015, it’s the event that Destiny‘s PvP players look forward to. Trials pulls Destiny up in streaming directories by attracting large numbers of viewers and streamers both. As far as I know, Bungie has not released Destiny‘s daily active user counts, but I would not be surprised to learn that it experiences similar peaks. It’s fueled the rise of many partnered streamers, enabling some to transform gaming from a hobby into a career. And it’s provided a healthy revenue stream for gamers who perform paid carries and account recoveries. There are legitimate companies that have grown around this! I’m curious to see what I can extract from the data before Trials of Osiris comes to an end on August 14, 2017.
My background
I’ve played since version 1.1. I am a tryhard. And for over half a year, I livestreamed my Destiny gameplay daily.
I’ve always been interested in computers, and have considered myself a power user. Data science and machine learning are both hot right now, and pursuing knowledge in this field would enable me to make an impact that goes far beyond gaming.
Inception
It all started when one of my subscribers brought to my attention an incident from Trials of Osiris.
The perpetrators approached my viewer’s fireteam and offered to let them win the match in exchange for allowing them to farm his fireteam for kills. I imagine my viewer’s fireteam was pretty excited by this deal – they’d have to suffer a few blows, but in exchange, they would earn a healthy elo boost. For over twenty minutes, they allowed the perpetrators to kill them repeatedly. Did it ever cross their minds that they might not get their reward? These are naïve gamers who didn’t pick up on the con. I imagine their stunned silence after the perpetrators wiped them out for the win.
Note the features that are provided to us through this individual match data: abnormally high kill counts, round wins by Bravo team in spite of zero kills, and a very long match timer.
Using this match as a starting point, I dove into the perpetrators’ match history to locate other offenders. For a brief moment, I was obsessed, but I stopped myself upon realizing that performing this type of analysis without furthering my skillset would take significant manhours. Additionally, I was gaming regularly at that time, and my efforts were already spread thin. I’d grown neglectful in self-care while grinding the game, falling out of touch with real world contacts.
Motivation
I declared on May 31 that I was quitting gaming to focus on grinding real life.
Making this decision was easy in light of the need to build up cashflow to the point where I can comfortably engage in side activities. In my lifetime, I want to assure myself a prosperous existence.
I’m interested in identifying cheaters, and reporting them. DoS attackers are not only a nuisance: their actions have resulted in real losses for professional streamers. Stat farmers, like the ones that my subscriber brought to my attention, prey on gullible gamers.
Research Problem
Can we identify cheaters?
Dataset(s)
I used guardian.gg’s leaderboard to obtain my data. Additional sources include Trials Report and Bungie.net.
- PlayStation http://guardian.gg/en/leaderboard/2/14
- Xbox http://guardian.gg/en/leaderboard/1/14
Many websites, guardian.gg included, have APIs that make it easy to pull data with which you can create your own views.
- PlayStation http://api.guardian.gg/leaderboard/2/14/#
- Xbox http://api.guardian.gg/leaderboard/1/14/#
where # is page number
This returned a JSON object, which is a little difficult to read unless you’re a machine.
From here, I created a Pandas DataFrame.
This dataset provides us with a bird’s eye view of the entire Destiny player population that has been entered into guardian.gg. I generated a dataset on July 13, 2017 from the Trials of Osiris leaderboards for Xbox, and retrieved just over 434,000 records.
Data Transformation
Before charting the data, I transformed it to get a better understanding of each individual player’s average performance per match. I divided Kills, Deaths, and Assists by games played. Likewise, I divided the number of games won by the total number of games played to arrive at win percentage.
Exploratory Charting
What does a cheater look like?
One major limitation of high-level data is that it’s difficult to pick out the good guys from the bad guys. This is where the significance of outliers comes in.
Vlad of Trials Report advised me of one Xbox Live user who’d been performing Denial of Service attacks in order to win matchups – I’ve highlighted that user. Notice how this user’s win percentage is far higher than we would expect. I’ve also highlighted a known stat farmer.
I’ll perform deeper dives into the data. I’m hoping that this initial effort will attract the attention of some of you who have more ideas and knowledge of how to gain useful insights from this dataset.
Have you been approached by individuals seeking to farm your fireteam?
Report their messages and let’s stamp them out, together.