With the Tour de France now disappearing in the rear-view mirror, I’ve been weighing up a post on the value of performance analysis as a predictive tool in sport, particularly given the criticism of our (Doc at the Veloclinic, Ammatti, Fred Grappe and Antoine Vayer) recent analysis of the Tour de France.
Throughout the process, I encouraged the use of insight and circumspection when looking at performance metrics, but however strongly the message was emphasized that performance does not constitute proof of doping, there is always a convenient and ‘lazy’ way to dismiss it as ‘pseudoscience’ (which is the new “never failed a test” defence, incidentally). That particular stick (pseudoscience) has been wielded in the context of Pistorius, Armstrong, hydration, barefoot running, running technique and fatigue, but not previously with the extremism seen during the Tour.
So here are some thoughts on the method, and an attempt to create some context around the value of trying to understand the world with imperfect methods (which we all admit they are).
Acceptable uncertainty – performance analysis is never exact
In my work with the SA Sevens Rugby team, we analyse performances. We analyse opposition patterns, we study their options and tendencies in various phases of the game. The purpose is to better understand, and thus predict, what they are likely to do. We can tell our players to expect Samoa to play a certain pattern, whereas Fiji will do the opposite. The players run onto the field knowing with a reasonable degree of certainty where a lineout throw will go, how the opponent will defend rucks, what they’ll attempt on kick-offs and how they are likely to run at us from broken play.
This is the same concept applied to sports the world over. In the NFL, it exists at perhaps its highest level, where expert analysts break down seemingly random patterns and discover ‘tells’ and methods to pre-empt opposition plays.
However, everyone involved recognizes that it is not formulaic. There is uncertainty, and this is accepted. Performance analysis, regardless of the sport, is always an exercise in probability because it happens in uncontrollable conditions, and so it adds value by adding insight rather than by conclusively and accurately predicting what will happen. I can’t give you the error bars on this kind of analysis, because sport is fluid and contextual, and so even the ‘safest’ bets are vulnerable to unique and specific situations. The smaller the data set, the greater the error, but it’s hard to assign a value to it.
The result is that a player cannot run onto the field with a text-book in their mind and then fail to use common sense, as well as all their other senses, to assess a given situation. Playing off memory, stats and data is a disaster, when you have eyes, ears and insight. Just because the pre-match analysis said that the opponent would do X does not mean options Y and Z are off the table, and so a ‘smart’ player is needed to discern the actual event from the performance analysis predictions. On that note, we’ve had players who can’t seem to grasp this, and who take to the field with only one option in their minds. They prove to be inflexible and are probably better off with less information. For most, however, the guidance is beneficial, if interpreted sensibly.
If I tell Novak Djokovic that Andy Murray is likely to serve down the T on points where he is leading but out wide when behind on the scoreboard (for instance, this may happen on 75% of points, with an error), Djokovic would be foolish to leap wide during the ball-toss, but he’d also be foolish to discard the information because there’s “uncertainty”. Inch, don’t leap.
In cycling, there seems to have been far tooo much “leaping”, in the sense that people seem to have either blindly embraced or discarded the concept of performance analysis, whether it be for the metric time up a mountain, estimated power output, or the physiological implications of that performance, without recognizing the necessary nuance.
Keep the other senses
So performance analysis in rugby, football, tennis, American football and basketball may be quite different from performance analysis is cycling and running in many respects, but it is similar in one important aspect – it does not replace common sense or give permission to disengage every other sense in order to rigidly accept a black and white version of the world of sport, which is clearly nuanced and everyone recognizes this. For this reason, it never constitutes proof.
Why people would want to accept such extremism is beyond me. That inflexibility fosters blindness and prevents insight. In effect, people who dismiss performance metrics and their implications as “worthless” because we are estimating power output are analogous to a blind man, offered 40% vision, but who refuses it because he only wants 100% sight. It’s 20/20 or nothing, and I believe that’s a flawed and narrow understanding of the world. It would be the same as a head coach saying to his analysts “If you can’t guarantee with 100% certainty what my opposition are going to do, I don’t want to know it at all”.
Before I’m seen to be proclaiming that say, 40% is “good enough”, I will say once again that we all recognize that it’s not. We want 90%, 100%. That’s why the process began with a call for the data, biological and performance. That’s because the “blurred” image offered by 40% vision, similar to the “performance pixellation” I wrote about after many stages in the 2013 Tour de France, may well lead a person into many blind alleys and unseen obstacles.
However, what seems overlooked is that people still have other senses – they have small, hearing and touch. And they should also have some common sense. So 40% vision added to other senses makes anyone better off than they were in total blindness. Unless, of course, said blind man decides that with his new-found 40%, he is going to ignore every other sense. This would be equally foolish in the opposite direction.
So if we can gain any insight at all, and combine it with our other senses, then just like our SA 7s rugby players, or the NFL footballers or a basketball player, we can run onto the field or court with more confidence, provided we retain the ability to interpret every situation as it develops for what it is.
As applied to the 2013 Tour de France
Therefore, when Chris Froome rides away from a field on the first week of the Tour at a power output that is higher than benchmarked, and produces a time that puts him in the company of known dopers, we should ask questions of that performance. But we cannot conclusively use it to prove that he is doping. That would be extremism, and it would be wrong. It’s for this reason that I have written, and still believe, that Vayer is too far to the extreme when he declares performances ‘mutant’. They are not – they’re still within the realms of physiological plausibility, though on the high side. That’s what Fred Grappe concluded when provided access to Froome’s data, and it is the conservative and correct approach.
Similarly, Rodriguez or Quintana should be regarded with some ‘wonder’ for getting progressively better, and eventually exceeding historical benchmarks during the race. Quintana, incidentally, produced the best performance of the entire Tour on its very final climb of Semnoz, benchmarked against historical norms using the pVAM method (pVAM, while I’m discussing it, is just that – a benchmarking method that for the first time, makes Semnoz comparable to Alp d’Huez even though it had never been done in the Tour. And that’s progress – for all the criticism of the pVAM method, it opens that possibility. Now it needs evolution)
A process, and evolution with uncertainty
Without the analysis of pVAM and the estimates of power output, plotting those power outputs against duration, and historical reference points, these performances have no context. Therefore, performance analysis asks questions, it does not answer them. In time, those answers may emerge, and performance analysis can help us to evaluate those answers as realistic or false. But with an eyes tightly shut approach, we are guessing. For instance, is it normal to show higher power outputs in week 3? We don’t know. Had we gathered data for 20 years, we’d have a pretty good idea.
The point is, this is all a process. It’s never going to be perfect, but if we halt evolution based on imperfect then we’ll never move forward. If Henry Ford and others had decided not to proceed with mass produced cars because he had in mind the perfect luxury vehicle, then we’d still be sitting on horse-drawn carriages. I think that Doc, Ammatti and even Vayer (despite some differences in the interpretation) are doing cycling a favour by calling for transparency and starting a discussion. I honestly believe that performance analysis is making progress in cycling as a result of their efforts. Again, it’s not perfect. There is uncertainty. But then, progress dies of boredom when there is certainty.
It doesn’t deserve outright dismissal, and it doesn’t warrant embracing as conclusive proof of anything (nor does it ever ask to be seen this way). So I’d thank all those for participating in the discussion. I hope it advances insight and enjoyment of the sport (it certainly does for me). I applaud people for wanting accuracy, I think that is always good. And if I have ever drawn a conclusion that goes beyond what the error of the estimates does not allow, I’d expect to be called out on it. Run me out of town i I say that a performance is proof of doping without recognizing its context or explaining that kind of extreme statement.
But equally, I’d hope that people read the articles (yeah, I know, they’re long) and then consider my interpretation of the numbers, and the explanation, the nuances, and then avoid the extremism reaction. We all have other senses, after all, so we can accept uncertainty and navigate with only partial vision, provided we engage those senses, most of all common sense.
Two final parting thoughts
That said, two final thoughts on the 2013 Tour, from a performance analysis perspective:
1. SRM compared to power estimates
These are the SRM data for five Tour performances, compared to our estimates using two methods. One is the method of Ferrari, using pVAM, the other is the CPL method, which pretty closely approximates Vayer’s method. Once again, they’re from Ammatti Pyoraily’s vast bank of performances:
Perfect? No. But not nearly as unusable as some have suggested. You can decide for yourself whether those estimates are worthless or not. I’d point out that the estimations are just as likely to underestimate power than overestimate it (3 vs 2 for CPL), so sometimes estimation gives benefits to the cyclist. Collectively, if we wish to avoid performance pixellation, the average error for these five performances is 0.6%.
The timing of climbs was criticized because it’s based on TV observations. An error of a few seconds, far smaller than any of the other errors, so it’s a bizarre criticism that emerged in the Tour. If we were hand-timing Usain Bolt in a 10-sec sprint, it’d be different, but not over 30 minutes or more. Wind? Of course. When you’re going in a straight line with tailwind for 90% of the race route, like Boston Marathon runners experienced in 2010, wind blows the whole thing out the water. I dare say it’s a lot less significant, based on SRM data, in the variable Tour geography.
Is it acceptable to be within 1-3% per climb? That depends what one wishes to conclude. If the pursuit is proof of doping, then of course it isn’t, but even exact SRM data wouldn’t “prove” anything anyway. We can always do better – you’ll notice that Ferrari’s method tends to overestimate power, and that’s because for lighter riders and faster speeds, the formula he developed doesn’t account fully for drag. That’s why Doc at the Veloclinic has been trying to develop a better method for performance prediction, taking into account the components – all part of the process of getting better, and that’s the evolution that will make 2014 better than 2013.
But I think on the whole (and again, the above table shows only five 2013 comparisons, but we’ve pooled this with other data from Sorensen and the 2010 Tour de France where Horner provided his SRM data) the error is consistently in that 1 – 3% range, and I’d argue that the method is not as poor as some have suggested.
More to the point, the exact data wouldn’t allow conclusive proof of doping or non-doping anyway, so given that the interpretation is subject to nuance and context, the error in the estimate can be accepted at this level. It’s an acceptable level of uncertainty. That’s the performance uncertainty aspect I wrote of earlier – we may not have 100% vision, but we have senses. We just need to use them fully as we add to them, uncertainty and all.
2. The patriotic blindfold
This whole performance analysis concept became heated and controversial because it seems that many in the English speaking world had a vested interest in the success of specific riders. Much like the most aggressive defence during the Lance Armstrong era was coming from the USA, some of the most vocal defense of Froome was coming from the UK. And in particular, the British media. People seemed incapable of even hearing a discussion when it concerned one of their own, and arguments were dismissed out of hand. The same numbers from an American, or a Spaniard, or an Italian, would have initiated a discussion. A prominent practitioner of this hypocrisy is David Walsh, who actually was the first person to tell me to analyse the speeds to confirm doping, as applied to Armstrong. As applied to Froome, it’s no longer a valid method.
Back to the race, by the end of the Tour, Quintana and Rodriguez were at the same level as Froome on Ax-3-Domaines, and Quintana even surpassed it on Semnoz to produce the best climb of the race, statistically speaking. Even accepting the constraints of “performance pixellation”, the trend was quite clear.
And in some parallel universe, hypothetically speaking, I wonder how the numbers and estimates would have been perceived if Quintana and Rodriguez started the Tour in that condition? What if Contador had joined them at speeds as fast as Armstrong and Ullrich a decade before? I dare say that had it been Quintana and Rodriguez accelerating clear on Ventoux and Ax-3-Domaines, the whole process of our performance analysis would have found a different story from the very first mountain. A new script, certainly one in which we’d be doing less defensive explanation than we’ve done. Would those performances have attracted a similar skepticism from the UK press as Froome did from the French media (by all accounts)? Certainly, and that’s the key – the performance warrants questions, not accusation, and measuring it, imperfection and all, can only be helpful.
Ultimately, though, the final word on the 2013 Tour is that we shouldn’t be accusing, just wondering. And given cycling’s history, and the fact that those entrusted with running the sport have shown themselves to be unable to clean it up, we cannot simply believe (blindly) in miracles this time around. So we wonder, reasonably, and use some kind of performance metric to gain some insight. Proof, no. But equally, not worthless.
On that note, crank punk nails the problem facing cycling’s appeals for trust. There’ve been scandals in the past, and as sure as anything, there will be doping scandals in the future. Paul Kimmage also recently gave this interview – he’s over on the cynical extreme, but his thoughts are eloquent and with some caution, well worth listening to. Jump to the 10:00 mark.
On that note, thanks for reading the Tour coverage, let’s do it again in a year. Next up, the IAAF World Championships in Moscow.