
Identifying the Style of Different Players
Automatically categorising players into play stylesWhen thinking about chess players, one of the first things that comes to mind is their play style. However, figuring out which style a player prefers usually requires going over a lot of games played by them. As the style only depends on the moves and games played, I figured that it should be possible to automatically put players in different categories.
I decided to categorise some players of the past, where I had a decent idea about their playing style. I only considered classical games from them, and I excluded any short draws or games where the players were either very young or past their prime.
How the classification works
My idea for the classification is to extract as much data as possible from the moves played and use this data to see which players are similar. The data I collected includes the average number of moves per game, the relative number of pawn and piece moves, moves in the centre, moves towards the opponent’s king, and much more.
I didn’t use any information from engines like the evaluation or number of mistakes, as I couldn’t categorise as many players if I had to analyse each move with an engine. I also figured that the moves themselves should contain all the information I need.
In total, I got 43 different variables for each player. To make this data useful for classifying different play styles, I used principal component analysis (PCA) to reduce the number of variables in the dataset.
PCA transforms the data into a different coordinate system, where each direction (component) captures the most variance in the data. So if many variables in the original data are highly correlated, they get boiled down to a single variable that captures most of that information. My idea was that some of the data I captured should correspond to attacking play, while other variables indicate positional play.
In the end, I used the first two principal components, which capture most of the variance, and using two dimensions has the advantage that the result can be easily visualised.
Note that the labels for positional, attacking, or universal players are just based on my views of the players.
The positional players are in the bottom left corner, while the more attacking players are more to the top right. This means that when one looks at the data from another player, they can see where the player ends up on the plot and have a decent idea about their style.
Overall, I’m happy with how the image looks, but what do these components actually represent?
Interpreting the components
To interpret the components, I looked at the correlation between the components and the original variables.
The first component is highly correlated with variables like the number of promotions, the relative number of moves in the opponent’s half, and the number of checks and king moves by both sides. Furthermore, it’s correlated to material imbalance (which includes situations like bishop versus knight), and it’s negatively correlated with the amount of material left on the board.
Therefore, it seems that the first coordinate indicates playing endgames, or at least games where many exchanges happen early on. Interestingly, attacking players like Polgar and Shirov have a high value in the first component, but I’d guess that their attacks often lead to many exchanges.
The second component is correlated with metrics that are related to attacking play, like pawn moves to the sixth rank, moves by the e- and f-pawns, and moves towards the enemy king.
So it’s unsurprising that players like Kasparov, Shirov, and Tal and up towards the top of the graph. However, I’d have assumed that Polgar would also have a higher value in this component.
Limitations and possible improvements
Many values I looked at are influenced by openings and the choices of the opponents. Therefore, the classification may not only depend on the player's style, but also on the era in which they played.
Many variables also depend on the strength of the opposition, which makes sense as strong players usually exploit the mistakes of their opponents rather than sticking to their own style of play.
One thing I’d like to do in the future is to collect even more data from the games for a better classification. In particular, I think that including variables like threats or sacrifices can help, but correctly identifying when a move is a sacrifice is more difficult.
I also didn’t look closely at modern players, since I think that they are more difficult to categorise, as they are more universal. When I add more variables to the initial data collection, I’ll make sure to also look at modern players.
Let me know what you think about this categorisation and if there are players you’d be interested in.
If you enjoyed this post, check out my Substack
You may also like

Is Carlsen better after losing a game?
Looking at Carlsen's score based on the previous result
Looking at the Quality of Rapid and Blitz Games
How does the quality of play from GMs change in different time controls?
Where do Grandmasters play Chess? - Lichess vs. Chess.com
This is the first large-scale analysis of Grandmaster activity across Chess.com and Lichess from 200…
How titled players lie to you
This post is a word of warning for the average club player. As the chess world is becoming increasin…
How well does Carlsen score in equal Endgames?
Looking at the data to see if Carlsen really squeezes out more wins