Monday, 31 August 2015

Thanks, FiveThirtyEight! (and another look at methodology)

For writing an article showing just how a professionally done Tennis ELO system could look, and for indirectly giving me some page views. I hope some of my new viewers stick around for my weekly updates.

For those who haven't seen it, this is the FiveThirtyEight article I am referring to.

The article itself focuses on the dominance of Serena Williams and how it compares to dominating players of the past, but as I have only been updating my rankings from the start of this year, I'm not going to comment on the article itself, instead I want to take a look at their methodology and how it compares to my simplified version, which I detailed at the start of the year here.




Like my rankings, they have treated all Grand Slam and Tour level matches equally, ignoring ITF and ATP Challenger matches. The rest of their methodology is located in their footnotes, specifically footnote #3.

They came across a similar problem to me when they looked at whether to take in to account set scores or just matches, but noted that any improvement in rating accuracy gained from taking in to account sets was marginal, and would probably over-complicate the system so like me, they are only considering whether the player won or lost.

The next consideration, how much to change a rating by after each match is where the 2 rating systems differ. I am using a constant K-factor of 20 (which FiveThirtyEight uses for its NBA ratings), but they have created a formula that also solves the one major problem I have found with my system, i.e. how to deal with new players so they are not over-ranked. The formula is 250/((games+5)^0.4), which lowers the variability of a player's rating as they play more games. They also noted that this formula substantially outperforms all the other alternatives (whilst possibly avoiding the over-fitting problem).

This is a big difference, as it gives their ratings a lot more variability than mine for players who have played fewer than 547 games on the tour (which is most players on the tour), and thus inflates the ratings to the 2500s as is clearly seen in the article (my ratings have a soft cap around the 2000 level). Below is a chart I created to show how a players base K-factor changes with each match played.


I would definitely like to post an updated version of my rankings using this formula, but I don't have a time machine to make the change retroactive or advanced enough software to take advantage of the data set of matches that FiveThirtyEight linked to at the end of the footnote (which would end up duplicating their work). In the short term, following the U.S Open, I will only be showing the ratings for players with a certain number of tournaments and matches played, the final number I will decide on after the tournament.