Analyzing HQ Trivia Data

HQ Trivia Analysis

HQ Trivia is an app and trivia game that was released in August 2017 for both Apple and Android devices. It started to become very popular near the end of 2017 in part due to people hearing about large cash prizes the game was offering. Sometimes prizes up to 25k USD was given away for answering 12 questions correctly.

Starting in October of 2017 a friend of mine started to manually record in a spreadsheet the game questions and answers as well as some of the stats behind who was winning and how much people were winning. It was shortly after this that he asked my help in ramping up data collection efforts. We both love working with data, and thought this would be a fun way to do some real world analysis on an interesting phone app. We wanted to answer questions like how many people on average answer each question correctly? What categories were the most difficult for people to answer? Do people stick around to watch the game after losing? What are people talking about most in the chat?

We set out to build a database to store every aspect of the game to answer these questions. At the time, we were the only ones that had this kind of data for the game. When we started it was still a small app without any of the viral popularity. This changed very quickly and in November of 2017, we were contacted by the Washington Post as they were going to write a story on the new trivia app and needed our data and analysis to make their story. This is when we knew this was something bigger than we thought initially.

I am going to share the story of how we collected the data and present some of the findings from the data that has previously been unseen. To my knowledge this is the only analysis of this kind of actual HQ trivia data that you might find on the internet. If you know of other analysis using raw HQ trivia data let me know in the comments!

Washington Post Story

Washington Post Article and Money

Once our data was featured in the Washington post we had all sorts of inquiries to use our data. We partnered with Money to provide data and analysis for 3 additional stories.

Justin and I met as we were both in the Georgia Tech Masters of Analytics program. Once our data was being featured in the news, Georgia Tech wanted to write about our work as well and even used our story in student recruitment efforts! Our work seemed to have paid off.

Now that you have an idea of what we were able to accomplish I am going to start getting into the details of how we collected our data and some analysis you might find interesting.

Data Collection

Manual Collection:

As I stated earlier, my friend Justin started manually recording all of the HQ trivia game data directly into a spreadsheet. Quite often he would look up and find the games as YouTube videos that someone recorded previously so he could pause and write things down. Needless to say this was very time consuming and not a lot of fun.

When Justin first mentioned this idea to me I told him he should look into Amazon’s mechanical turk to enlist the help of other people to capture this data. Justin created a HTML form and instructed the workers to capture the required data from the youtube videos he supplied. This system worked great until we saw the bill. It was simply not sustainable to keep it going for a hobby project. This is really where I started to get involved. Justin knew that I was already working as a data scientist and was very experienced in this type of work so he asked me for help in automating this data collection.

What our data looked like initially

Machine Learning and Optical Character Recognition:

The first idea that came to mind was using optical character recognition to look at the screenshots of the game and parse out the data we were interested in. I leveraged the tesseract library to make a working prototype.

My first prototype worked roughly as follows:

  • Download YouTube videos
  • Convert videos into a series of images
  • Classification algorithm to find the images that we want to parse text from
  • Optical character recognition of the remaining images

The script worked, but was plagued with inaccuracies that simply were not acceptable for what we were wanting to achieve. It was around this time that I noticed people started to cheat on the game and would build programs that would google answers in real time. Clearly someone had figured out how to parse the questions very quickly so I started looking at other people’s code to figure out how. This leads into our final technique.

Undocumented API

The scripts that I found were hooking into the api of the application which would send the questions and answers to the phone app. Using the cheating scripts as inspiration I quickly wrote a script which would instead take the data generated from HQ Trivia and insert them into a database. I setup a server to run this script and capture the afternoon and evening games that were broadcasted.

HQ Insiders Database

We compiled a database of game data from October of 2017 through August 2018. It contains roughly 12,000 unique trivia questions and answers, over 300,000 chat messages, player payouts, and the broadcast metrics of 364 unique games.


I am going to detail some of the analysis we did to answer the questions that motivated our efforts in the beginning and then show how the game went through a surge in popularity and eventually going into its decline.

A typical HQ trivia game lasted around 15 to 18 minutes and consisted of 12-15 trivia questions. Each question had three answers to choose from. HQ trivia was a game show and attempted to entertain it’s users while playing. There are two main play times, an afternoon game at 3:00 PM EST and an evening game at 9:00 PM EST. One of the first questions we wanted to ask was how many people watch the game who are not actively playing. I will call these viewers.

Viewership by Game Minute

A viewer is someone who is not actively playing. They might have been eliminated or never attempted to answer the first question. Each line represents a unique game.

These charts show that most people stick around for the first 5 to 10 minutes of the game, and then viewership starts to drop off quickly. The evening games that show up as outliers were the games with either a special guest host or a very large payout so people wanted to stick around and watch who wins.

App Downloads and Key Events

We were able to obtain app download estimates for the HQ trivia app on both Apple and Android app stores. We plotted this data against some key events to see how it affected app downloads. We annotate large cash prizes and the ready player one sponsored game announcement. It is clear that the first large prize that HQ trivia offered helped keep downloads high and the next big catalyst was the ready player one sponsorship announcement.

Distribution of Winners Per Game

When someone wins a game of HQ trivia the prize money is split between all the winners. Therefore the highest payouts occur in the games with the fewest amount of winners. Unfortunately it was rare to be in a game with very few winners. We present a density plot to show the distribution of winners in all the games we recorded. unfortunately a limitation of the API we used to collect data cut off at 750 winners in a game so we were not able to accurately record all games with more than this amount of winners.

Did the Game Become Easier to Win?

The short answer to this is yes. While I do not think the difficulty of the questions changed much, cheating became very prevalent as the game increased in popularity. There were live chat groups on telegram where people would crowdsource answers and the fact that more people were figuring out how to obtain the questions from their api to quickly lookup the answers in google. The chart below shows that the average % of players with correct answers increased over the time period we collected data.

The missing section of data was during the time we were transitioning our data collection methods.

Rise and Decline of HQ Trivia

I believe that running a trivia app on a phone comes with problems that just cannot be overcome. The main issue is offering a monetary incentive to win a trivia game, naturally players will start to maximize their efforts to win. This can be as simple as quickly googling questions while playing, crowdsourcing answers in a living room filled with friends or a more complex chat room filled with thousands of players, and finally the players who created very sophisticated cheating scripts that could look up answers. It is fun to win 10 dollars when playing a game, and absolutely exhilarating winning 25,000! But when players start to win 10 cents it starts to lose its appeal fast. This is exactly what started to happen.

If you track the blue average line you can see a steady decrease in payouts. After May it was becoming very common to win just a few cents or a couple of dollars. Of course there were special events and games that would boost that amount, but the winnings just weren’t as good as they used to be.

If you were to win in November of 2017, the average winning amount was $63.38. Contrast that with winning in July of 2018 where the average winning amount was only $5.72.

You can see that the maximum amount of players per game over time was steadily decreasing after peaking sometime around March or April of 2018. We did not collect data past September but from what I understand the game continues to decline based on other news and social media posts I have read about the game.

What Were People Saying in the Chat?

HQ Trivia offered players to chat while in game. There were so many players at any given time that the messages were coming in quick and fast with little time to read. On top of this if you were playing the chats were distracting and a lot of people chose to hide it. I wanted to take a look at the chats we collected to see exactly what was being said. Mostly it was people talking about their birthday, Scott the host, some glitch in the game they experienced, or something they loved. In fact these were the 4 most common words used in the chat.

There were also some memes talked about frequently in chat.

The red dab line you can see above is when elmo hosted the show. Players asked for elmo to dab. Shortly after elmo delivered.

I also generated some word clouds of certain games. Below is a word cloud from the special game with The Voice.

Code to Reproduce

You can find the R code that I used to produce these graphs and analysis here.