# The Philosophy Of Recording Game Statistics

Posted by

I started being interested in sports statistics at about the age of seven.  Or at least that is my earliest memory of being interested.  My first interest was cricket statistics.  I found those numbers to be endlessly fascinating and wanted to know how they were arrived at, so I taught myself many basic maths concepts working out why someone’s average was what it was.  Since that time, I have continued to be interested in the numbers of sport, an interest which has dovetailed nicely into my profession.

Specifically in volleyball, I have spent a lot of time thinking about the what, how and why of recording the statistics for volleyball.  I have come up with the following three philosophical principles for recording statistics**.  In my opinion, the statistics that we record and the values we give them, i.e. the raw data file, should:

1. be an accurate reflection of the game, both the raw data and the derived statistics.
2. contain a logical internal consistency.

There is quite a lot of overlap in these areas, but we can still begin at the beginning.  Firstly, in terms of raw data, I would expect that a volleyball literate person, knowing the different symbols and codes, should be able to read the codes and create a reasonably accurate replay of the match in his head.  In derived statistics, that same volleyball literate person should be able to see a match report or box score and create a reasonably accurate overall impression of the match; who were the better players, what were the keys to victory/defeat, what was the quality of the match, etc.  Secondly, there should be a logical internal consistency within and between the skills.  A block point should be defined essentially the same way as a spike or serve point.  For example, if a serve that is shanked by the receiver counts as a ace, then a spike that is shanked by the defender must also be a spike point.  What is a spike or a block should be defined consistently each time.  In my own personal coding, I define what is a spike and what is a free/down ball, by how I would view the action if I were a spectator and the ball hit the floor.  If I would consider it an attack point, I have to code it as a spike.  If I would consider it a defensive error, I have to code it as a free ball.  Another example, an overpass that is killed with a two handed ‘blocking’ action, should be defined as a spike as logically a block must be preceeded by a spike.

The most important thing is how common sense applies to these first two points.  And of course, common sense, as the saying goes, is anything but common.  For example, I have seen people scout free balls given over the net as attack attempts.  According to the strict letter of the volleyball rules, a ball directed over the net is an attack.  However, to record those actions as attack attempts does not increase our understanding of what actually happened nor provide us with an accurate impression of the match.  In fact I would argue it hinders that ability.  Another is the situation in which a blocker touches the net.  Technically, the block fault ends the rally before the spike has reached its conclusion.  By this strict understanding of the rule, no net touches should be recorded as attack points; it should be an attack in play and a block error.  I have never seen anyone record a match is this manner though and it would not help our understanding of the game if they did. What is common is that all block net touches are automatically recorded as attack points.  While this does have a logical consistency, it actually hinders our overall ability to understand the game.  Common sense has to lead us to the point where we ‘ignore’ net touches that don’t otherwise affect the outcome of the rally (i.e. the spike ended the rally anyway) and only record those that interrupted a rally that would have continued.  And in that case the attack cannot be recorded as a point, but must be only in play.

I often return to the forest and trees analogy.  What we want from statistics is a beautiful, representative rendering of forest, that we can enter and study individual trees, the grass in between them and the animals running around if need be.  Without beginning with a working philosophy of what we want we can spend too much on the pointless details of the individual trees, while putting them in the wrong places and forgetting the grass and animals, therefore losing the feeling of the whole forest.  If we spend too much time on the trees we aren’t very good coaches.

** The statistics we end up with, and the analysis we carry out on them, are another discussion altogether. One that I would like to think I will get around to someday.

1. Martin says:

I totally agree with this article, and in our team we ares trying to work with similar definitions: so the scout has to judge the qulity of the attack, and then decide whether this was an attacking point or a defensive mistake.

But we have other problems too: for example, how to scout a ball that is perfectly in, but the referee calls it out. or a touch that was not seen…

Like

1. markleb says:

You can’t argue against the logic of the game. The referee’s decision is final.

Like

2. The guy who taught me how to use data volley explained that if your own team spikes and the opponent touches the net it’s a kill. but if you touch the net it’s a blocking error. that was the inconsistency i found strange

what do you think of AFL statistics? they never seem to explain why a team won, and the commentators don’t seem to know how to come up with any analysis or meaningful derivations beyond raw data.

Like

1. markleb says:

There should always be a logical consistency. Scouting the two teams differently, I don’t really understand.
I don’t know much about AFL statistics. I think the commentators are not much good at explaining what is going on in general.

Like

2. It sounds like the problem is in the software ‘logic’ that is used. This is completely normal. The important thing to remember is that the benefits of the software logic far far far outweigh the negatives.

Like

1. markleb says:

It’s not a problem of the software logic. Data Volley allows you to define your own scales. The problem is ignoring points 1 and 2 and not having a point 3.

Like

3. I have scouted for many years now and kept volleyball statistics. I was first taught to do it like this that all net-errors are scouted as an error for the blocker and a kill for whoever attacked (you can do both). But with time, as I feel it is not right for anyone I have changed it to record a kill for the attacker ONLY when that ball would have been killed if there was no net violation. I believe that is what the writer is trying to describe. that if the rally would have continued, because it was dug or the ball was hit clearly out, I do not award a kill. They get a – if the ball is hit out. I think this is the most fair way.

Like

4. The biggest problem with the way volleyball data are generally collected (using software systems) is with your 3rd point – follow common sense. Sometimes it is hard to fit the outcomes skill executions into a ‘common sense’ model.

The way we collect data, we make the assumption that a good outcome (ace) is because of a good execution (of a serve). That is, because I passed the ball badly, your serve must have been good.

It is not necessarily common sense to say that, because Player B did this, the quality of Player A’s action must have been this.

Having said this, there is nothing wrong with the way we collect the data. The only problem we have is in the interpretation. As long as we remember that we are collecting outcome data, not quality data, then we should be fine.

Like