CMPS-3160 Final Project: How trends are changed on Steam?

Author: Bo Zhang, Tian Xie

https://dekuwang.github.io/

Background Information:

Both Tian Xie and Bo Zhang are loyal Steam users, so we wanted to analyze how various games have evolved over time and how players' preferences have changed. Currently, we have selected a database of Steam games in Data World as of 2018, a dataset that contains information on all of Steam's games. This dataset includes game release dates, game genres, site names, developers, and some recommendations. Much of this data does not have a significant impact on the goals of our analysis. Therefore, we chose to narrow down the data types to game release dates, game genres, and whether multiplayer is supported. We believe that these data have a direct or indirect effect on the popularity of the game. In addition to this although the game genre is the same, there are innovations in gameplay. This factor also affects the popularity of the game genre. Therefore, if there is a significant change in the popularity of a certain genre, we will also explain the change in popularity in relation to the change in the genre in that year.

With this database, the first question we would like to ask is: what are the trends for each type of game? While a certain game genre has largely managed to become mainstream and most popular with users in every era. However, as technology improves and the way games are played changes. The situation where a certain game genre becomes mainstream in the long run will also change. Our current idea is to add up the number of recommendations for all games of the same genre in the same year, and then use a line graph to indicate the popularity of each game genre in that year. the Y-axis is set to the number of recommendations, and the X-axis is set to the year. If a game genre has changed in an unrecognizable way compared to the previous year, we consider this to be a case of a new game model or new technology emerging. With this one database, we can also determine what the most popular games are each year. By grouping games released in the same year into categories, we can find out what the most recommended games are and then analyze the genre of the game. This can be linked to the previous line graphs showing the popularity of different game genres each year. This is because it is likely that the most popular game of the year drove the popularity of the game genre it belonged to. Also we want to speculate on what influenced the increase or decrease in the number of games by some external factors.

Game makers and authors must want their games to be loved by mainstream players. Therefore, the choice of game genre is extremely important for the initial development of the game. Although there have been some surprising games in all genres of the steam and gaming industry over the past decade or so, there are always a few game genres that are generally more popular with the player base. For example, the best games of the year in recent years are action games, which makes us think that it is not because action games are generally more popular with players, so game manufacturers are more inclined to make action games? Based on this basic idea, we wanted to create a predictive model from the data we already had. By adjusting the game genres True or False, we could predict which combination of game genres would give an impression of higher or lower ratings. I think this will help a lot in the basic direction of game production, and developers will choose more ideal combinations to give the possibility of higher ratings for the games to be released.

The part about how we collaborated. We created a repository on GitHub that contained the databases we needed to use and the content we developed. Initially, we agreed to discuss this in Google meetings two to three times a week. The discussions focused on database selection and what kinds of problems we wanted to explore with those databases. After the discussion, we would each work on the parts mentioned in the discussion and push them to GitHub. after the discussion, we worked individually, but we found that this collaborative model wasn't efficient. So we then used the Live Share plugin that comes with vscode. This way, two people can edit on the same file and the content can be updated quickly. The initial database was created by Bo Zhang, who proposed to analyze the performance of each game genre in different years and then create a line graph to show the popularity of different game genres in different periods, while Xie Tian was responsible for adding more details to the later ideas. For example, the change in the number of games during the popularity period and whether the change in the number of players had an impact on the number of recommendations for each game genre.

$\color{#39C5BB}{\text{Part 1:}}$

$\color{#39C5BB}{\text{Extraction, Transform, and Load}}$

From Kaggle, we found a dataset which include all games information from 2004 to 2018. Let's load that dataset first.

Great! This dataset include far more information than I thought. However, there are many categorial data. Categorical will be hard to analyze, so let's modified some columns.

Firstly, we can change release_date from object type to datetime type.

Secondly, the type of price data is string now, and we want to convert it to int. So my solution is simple: write a for loop, if the price is free, then price equal to 0, for the rest of them I will simply remove the first $ letter and then transfer them into int. Trust me, this is the most efficient way because you can never imagine how game developers describe their game is free to play.

We chose to make our model for predicting ratings via The K Nearest Neighbor. Because this method does not accept data that is not quantitative, we need to transform the game type to boolean.

We see there are 34 types of games on steams. However, many of them are not really represend a game genre. So we only need several of them.

Also, game review is in words, so I need to transfer it into numbers for analyzing.

Great! Now we transfered the date data and price data to the type that easier for us to analyze! However, now we have too much columns, so next step I will extract the useful data from the original dataframe to a new one.

Now our database is more clearly. It's time for the next step!

$\color{#39C5BB}{\text{Part 2:}}$

$\color{#39C5BB}{\text{Exploratory Data Analysis}}$

I always have a question: How many game are published on steam? With the data we have, we can get this answer easily.

Wow, until 2018, there are 40833 games on steam. Now I'm curious about how many games are phblished per year? To answer this question, we can use the same method, and we can also generate a graph for it. For this part, I will only use the data from 2004 to 2018.

Great! Now we have a graph represent the amount of game released on steam per year. Also, from this graph, I noticed a trend, and this is the first feature we noticed:

$\color{66ccff}{\text{Feature 1: There are more and more new games are published on steam every year. }}$

Then let's see what is the most popular game genre on Steam:

From the graph above, I noticed that Action game is the most popular game genre on steam. Let's see how many action games are published each year.

It looks like most games share the same trend as the amount of games published on steam per year. So now, we can get our second feature:

Thirdly, Let's see the reviews on steam.

Wow, more than 20000 games on steam have less than 10 reviews. It looks like make money though steam is not a good idea for individuals, unless they are sure that their game will become popular after publishing. And we just found the third feature of this dataset.

$\color{66ccff}{\text{Feature 3: More than half of games on Steam have less than 10 reviews}}$

The next step we want to do is create a model, For this model, we will use the game genres, release year and price to predict the review of the game.

Conclusion

The prediction model shows that action games can get high scores with many game genres, but they get much lower scores with education games. Therefore, we can assume that the reason for the significant increase in the number of action games is that action games can easily be paired with other types of games, which makes the game more playable and the effect of the pairing tends to make the game score higher. The reason for the low number of educational games is that game makers believe that these games are not popular with the mainstream gaming community. We conclude that the change in the number of educational games supports our view. After a brief rise in the number of educational games there was a downward trend.

We summarize several conjectures about the reasons for the significant increase in the number of games. First of all the overall gaming environment has changed dramatically under the influence of Covid. People had a lot of time to stay at home during the epidemic. The reduction in recreational activities will make people more inclined to online entertainment. As a result more producers as well as players flocked to the video game industry during this period, allowing the overall number of games to increase. The second thing is the development of development tools. The development of game engines never stops and the tools used by developers become more and more convenient, so game developers don't need to invest a lot of time to learn how to make games. For example, the development of 3D engines roughly simulates real-world physics collisions. And character modeling can be downloaded for a fee from a number of websites. Small cost game production can build a general world view and framework, and other game kernels, character modeling, and sound effects can be directly used from existing online resources. Therefore, the number of games produced and the convenience of development tools have a certain connection.