Mix and match until you find similarities: Oscar Pistorius research evaluated
It's taken me a couple days longer than I would have thought to get around to this post, analysing the recently published research that was responsible for the CAS' decision to clear Oscar Pistorius to compete against able-bodied athletes.
There are a couple of reasons for this - one is the ubiquitous work excuse. But it's also proven very difficult to sift through the paper and find anything to say that hasn't already been said dozens of times before. I almost decided to simply post up links to all the articles I've written on the subject in the last 18 months, because this latest "revelatory" paper does little to dispel any of those arguments, and does not, in my opinion, introduce many new points to the debate. What it does do is so fraught with method questions that I am not sure what I believe, and the difficult part was sifting through the paper to understand how comparisons between Pistorius and the able-bodied runners had been made.
I also struggled whether to do this as one post or to break it up into a few. Eventually, I decided on one, mostly because later this week, I have another post planned and didn't want to interrupt this one. The result, unfortunately, is a long post (sorry). But if it helps, it's divided into three sections, so you can select to read it in parts if you wish:
1. Broad thoughts on the methods - key implications, problems and questions
2. The results - what was found and what it means
3. A wrap-up - the "collective" evidence
1. The methods and comparison - who should he be compared to?
Upfront, one might as well explain how it worked - the research approach was to measure Pistorius and then compare him to able-bodied runners. If they could find that he was "similar" to able-bodied controls, then they would report functional similarity and they'd have grounds to clear him. The keys then, are the definition of "similar" and the comparison group.
What is similar?
For the purposes of the paper, it's a difference of less than two standard deviations between OP and their able-bodied runners. This is a statistical method used fairly often, though 2 SD is a pretty conservative boundary condition.
The bigger issue is who he is compared to - if their control-group data are not robust, then any comparison is going to be erroneous. If the control is either too small or not well-matched, then you are comparing apples to pears and the criteria for "similarity" are flawed. In this study, then, one would need large groups of 400m sprinters who run between 46 and 48 seconds for the 400m event. This group did not exist.
These able-bodied "control" runners comprised all of FOUR 400m sprinters. The rest were sub-elite distance runners and elite distance runners (we don't know what distance - it is not reported. We only know that one of them is Zersenay Tadese - if you're wondering what he's doing in a comparison with a 400m sprinter, join the club...). We also don't know how good a "sub-elite" runner is.
These things are absolutely crucial, because the comparisons made throughout the research paper are reliant on a valid group to which OP can be compared. That group does not exist. You cannot take only 4 sprinters and generate any meaningful data to which you can compare Pistorius. I suspect (based on some info I was once given) that the initial intention was to use only these sprinters as a comparison. But, as we shall see, OP was very diferent to them, so the re-inforcements in the form of distance runners were brought in. OP was not "similar" to the sprinters, so simply add more subjects until that 2 x SD condition is met...The result is that throughout the paper, it's not always clear who is being compared to who.
Another concern is timing. It seems (and again, I'm not 100% sure what was done, it's explained very "broadly) that the able-bodied controls were not all actually tested with Pistorius. Rather, their data were "historical", in that they existed long before the Pistorius testing sessions took place, and the researchers simply drew from the archives to find them. This is not necessarily a problem, it happens fairly often, provided the methods used are identical and the equipment is calibrated properly. The same thing actually came up in the debate around Ed Coyle's long term research on Lance Armstrong. The problem for this paper is that nowhere is it reported what the time-frames are, how the equipment was used and which athletes are being compared to OP at any given point. That may sound like nit-picking, but when you see how OP was declared "similar", then it has massive implications.
State of training - a crucial factor
And then finally (and I believe this is a crucial factor), the state of training on the subjects is never reported. This has enormous implications for the comparisons, because Pistorius was, according to media reports and his own words, "very unfit and untrained" throughout the period when this testing was done. The stress of the case and all the travel had detracted from his build-up, and this was eventually the reason given for failing to make the Beijing Olympic qualifying mark.
So, we have a comparison between an apparently unfit amputee and (possibly) fit 400m runners, distance runners and sub-elite distance runners. When we look at the results, the measures included VO2 peak, oxygen cost, top speeds, fatigue tests, top sprinting speeds - all factors that would be massively influenced by training. If it is true (and it is likely) that Pistorius was untrained, then he is being compared to trained subjects, and every measurement comparison is invalid. Scientifically, you cannot perform a comparitive study failing to control this aspect.
And, what is more, if he is found to be physiologically similar (as the paper will conclude) then one has to ask what effect training would have - any similarities in variables so strongly influenced by performance will become superior once training effects are factored in. The comparison simply does not work. It must be said - those controls might have been equally untrained, or perhaps Pistorius was trained (I doubt it, based on what Pistorius and our newspapers here were saying), but it is was never reported and remains a question mark in the paper.
2. The findings
A 17% difference in efficiency
Looking at the results, the graph below shows the measured oxygen cost of running.
Very importantly, it is measured using what is called a discontinuous protocol that consisted of running for 5 to 7 minutes at each of a range of speeds (these speeds are not reported, amazingly), with a 3 to 5 minute rest period in between (the exact duration is not reported). A word or two on the methods:
- This is not a test that a sprinter should be using - when does a sprinter ever run a sustained bout of 7 minutes, repeated over and over? This is a test suited to an endurance athlete, and as such, will under-estimate values for a sprinter. I recall testing an elite squash player, and we could produce completely different results if we used this kind of test compared to an all-out, shorter duration max test that lasted 10 minutes. Fitness is obviously key to this as well - an unfit athlete (read Oscar Pistorius) would struggle in this test...I do however appreciate that there are issues around aerobic vs anaerobic metabolism.
- Second, the speeds were not reported, and nor were the rest periods - was a 3 minute rest standard, or could the athlete take as long as they felt like to recover? Perhaps the recovery time was "selective" to allow OP to continue to higher levels? If it's not reported, it's possible.
Others are the passive energy return from the carbon-fibre blades, the improved storage and release of energy by carbon fibre, and reduced work of having to accelerate lighter limbs. These are all reasons for an advantage, and have been discussed many times before, but are not raised in the paper. What this result does confirm is that the theoretical arguments made are at least valid. It does not necessarily translate into a performance advantage, because of the range of reasons that might produce this difference. Actually sifting through these reasons requires data that was either not obtained or not presented.
VO2 peak and peak aerobic running speed
The next measurement of interest is the highest VO2 recorded during the trial. The authors refer to it as a VO2 max, which is incorrect - because of the protocol used, it's not a true maximum. Instead, it represents only a VO2 peak, and Pistorius is measured at 52.7 ml/kg/min while the controls are at 57.0 ml/kg/min, a difference of 8%. This suggests the previously measured 17% lower oxygen cost of running for OP is not simply due to lower muscle mass, and therefore should be taken a little more seriously than just dismissing it as such.
What is very interesting, and the authors hang a lot of their conclusion on this, is that the running speeds when this peak O2 were achieved are "essentially the same" (author's words). That is, OP hits his O2 peak at a speed of 5.0 m/s, whereas the control sprinters hit it at 4.9 m/s (with a SD of 0.02 m/s, which will become very important in a moment).
Now, a couple of issues here. Firstly, the difference of 0.1 m/s translates into about 2 seconds in a 400m race at the speeds reported. That is significant from a performance point of view, if not stats. But even more vitally, in the paper, the Standard Deviation for the control runner's speeds is 0.02 m/s. Therefore, the difference between Pistorius and the control athletes is equal to FIVE Standard Deviations. This is very, very different, and not "essentially the same". Using the paper's own methods, you'll recall that 2 Standard Deviations was classified as different - here we have a difference of 5 SD. Unless we are going to accept that stats should be used selectively to prove a point, the argument should end right here. Oscar Pistorius runs faster than the controls using aerobic metabolism - that represents a physiological and performance advantage.
State of training - why it's vital
Finally, I refer back to the issue of state of training and equal comparisons. The measurement of VO2 peak and running speed are highly influenced by training status/fitness. Are the able-bodied runners and Pistorius equally trained? If not, then an untrained Pistorius is producing physiological results that are comparable to trained athletes. A trained, competitive athlete will achieve higher speeds, a greater VO2 max, but not necessarily improved efficiency. The implication is that Pistorius' VO2 peak will rise, the running speed at which he hits VO2 peak will increase, and he'll look EVEN MORE DIFFERENT to them if the comparison was appropriate. The paper presents a comparison which cannot be trusted - perhaps it's valid, but the critical information is never reported (who are these subjects and how trained are they?). This should have been picked up on for the CAS hearing (and would have been, had this paper been subject to normal scientific process)
Fatigue test results
The next set of tests done was on fatigue, and these are interesting and do actually add to the debate. Here, it seems that OP did a series of all-out sprints to fatigue at a range of different speeds. The methods are actually very poorly explained, and it says only that the range of speeds was from 6.6 m/s (which he held for 89.5 seconds) to 10.8 m/s (less than 2 seconds). There is no explanation of what speeds were completed, how long the rest periods were, or how many intervals were run by each subject (including Pistorius) - "between 6 and 15 tests" is the only explanation given in the paper. The authors do refer to two other published studies in which the fatigue tests are explained. That is obviously good, but the problem remains that Pistorius is about to be compared with a very specific intention of finding either diference or similarity, and so the method used for him becomes absolutely crucial. Even with the method published elsewhere, a comparison of one athlete to that 'database' requires identical methods and process to be followed, and, at the very least, explained in detail.
The implications of this are important. "Between 6 and 15" is an enormous difference. It does not take a great level of insight to appreciate that if you are trying to assess fatigue tolerance using all-out runs to exhaustion, the athlete who has done 6 fatiguing tests will produce a different result to the athlete doing 15 fatiguing tests. So why is the range so large? This is not accounted for in the paper, and the reason for this, I suspect, is that not all subjects have done the same protocol. In fact, the results might well have been collected over a period of years as a 'database' of sorts was formed, with each athlete doing a slightly different protocol.
Point is, if the methods followed are not identical, then one must be careful about making direct comparisons, especially in a fatigue-trial, and especially when subject numbers are low (as they are here). And, any comparisons must be explained in the context of which methods were used for which subjects. In this paper, neither happens - some sprinters were tested, Oscar Pistorius was tested. When and how? These are details the authors seem to have decided are not worth reporting, perhaps because they don't lend themselves to the desired finding.
Yet they're being compared, with only trust to back it up. Given the controversial nature of the subject, and the financial incentives behind Pistorius (Nike, Ossur and co.), independent verification of what was done should have been a pre-requisite for this research to ever be accepted by the CAS. At the very least, the IAAF should have been allowed representation.
The comparison - OP compared to...distance runners
Returning to the fatigue tests, for "similarity", Pistorius is compared to one sprinter and two distance runners - you may decide for yourself if that comparison is valid...
The question I have to ask is why not just give us the comparison with four sprinters? If the data exist, then show it. Unless it does not support the desired conclusion, which, as long as data is "hidden", has to be a possibility in a matter as sensitive and controversial at this. This lack of transparency is a major problem. Ordinarily, science is based on some "trust" that researchers will do what is deemed appropriate. However, the circumstances of this case change the stakes a little.
Pistorius was found to fatigue similarly to these control subjects. That is, he holds the given speed for similar durations. Again, the state of training is a vital aspect here - would a trained Pistorius still fatigue similarly? How comparable are the controls? With training, would Pistorius be better able to maintain speeds, leading to a conclusion that he does not fatigue similarly?
This is another reason why the state of training is so vital, and all reports suggest that Pistorius was untrained at the time of the testing - either that, or he was lying in the media last year when he said the case had kept him from training. Based on his performances in trying to qualify for Beijing, I believe the former - he was untrained, yet still comparable to able-bodied DISTANCE runners for running times.
Another interesting point is that these kinds of constant speed to fatigue tests are very dubious as markers of performance or fatigue. There was a big debate in sports science a few years ago, and the general consensus among performance physiologists (who look at pacing strategy as their main interest) is that you can't infer fatigue or performance from a trial to volitional exhaustion, because they're not repeatable enough and allow too many other factors to influence the result (training is just one of them). You cannot therefore evaluate pacing strategy or fatigue using trials at a fixed speed - they are useful for investigating changes in physiology, but to infer performance is incorrect.
Of course, one obvious limitation is that Pistorius could control the result of this testing by stopping early, given that he knew the theory is that he fatigues less quickly than able-bodied runners. This is why the IAAF should have had representation at the testing - they did not...
Why the selective display of results?
But more to the point, why compare Pistorius to distance runners? Is it valid to ask whether a sprinter fatigues similarly to elite distance runners? And where are the other data? =
You have potentially four sprinters to compare him to! Perhaps they didn't, and only one sprint control existed - this should be reported. Yet they choose to use two distance athletes, and their finding is that he is "similar". In other words, the 400m sprinter shows similar fatigue characteristics to elite distance athletes...extra-ordinary. You'll be aware of course, that distance runners SHOULD show better fatigue resistance, because that's what their events rely on. We know that optimal distance races are evenly-paced, whereas sprinters slow down in the second half. Therefore, to compare a distance runner to a sprinter, and show similar fatigue patters, especially when the sprinter is supposedly untrained (again, this is not reported, so it is speculation), well, that's an incredible finding...
Pistorius' pacing strategy
The authors explain that their finding explains why Pistorius has such an incredibly fast finish in his 400m races. You'll recall that he is the only 400m runner in history who finishes with a faster second 200m than the first. Part of this is without doubt down to his slower start, which has been widely acknowledged. However, what the paper puts forward is that it accounts for all of his unique pacing, which is impossible. Remember, Pistorius has run 10.91 seconds for 100m. That means that he cannot be losing more than about 0.8 seconds at the start (unless you'd like to believe he is a 10second 100m runner).
If he loses 0.8 seconds at the start of the 400m race, it accounts for only part of the time he 'makes up' in the second half. Pistorius runs the second 200m of his races almost 2 seconds faster than the first 200m - only 0.8 s (at most) can be explained by a faster start. Besides, I already corrected for that slower start by relating everything to the 100m time, and it shows the same thing...his fatigue profile DURING COMPETITION is different from other elite athletes.
The Weyand-Herr study puts the rest down to a deliberate pacing strategy, in which case his coach should really be fired, because if that's what he does deliberately, then he deserves the sack, so inefficient is the idea that you should speed up at the end of a 400m race. And in case anyone is thinking that my argument is based on one race - it isn't. I've watched Pistorius many times here in SA, and every race is the same, it's what he does (with the exception of the Beijing Paralympics, but then he said he was unfit and had stomach problems).
On the fatigue front, the jury is out - the comparison with distance runners is flawed, the test is flawed (the manipulation of this particular fatigue test is very, very easy) and the proof of a fatigue advantage will always come from performance, and that speaks very loudly at this stage.
The final section of the paper looked at Pistorius' sprinting mechanics compared to able-bodied controls. To summarize this section, the graphs below present the key information measured at two running speeds - 10m/s and top speed. (click to enlarge):
To sum up, Pistorius has longer contact times (14%), shorter swing times (21%), shorter aerial times (34%) and a lower peak vertical force (14%) than able-bodied athletes. So what does this all mean?
Well, there are certain similarities with what Bruggemann found back in October 2007. His results, which included energy measurements on the blades, led him to conclude the following:
"Sprinting with artificial limbs is significantly different to able-bodied sprinting on a hard surface. It is a different kind of locomotion at a lower metabolic cost"
In the current paper, the authors conclude that "running on modern, lower-limb sprinting prostheses appears to be ... mechanically different than running with intact limbs".
The one contentious point is the vertical and horizontal forces experienced by Pistorius during running. The graph below is taken from the paper, showing the vertical and horizontal forces:
Compare that to this graph, which was produced by Bruggemann in his 2007 research on Pistorius.
They show basically the same thing - Pistorius experiences lower vertical and horizontal forces. Where it becomes debated is the impact that would have on performance. The latest study suggests that the lower vertical forces might present a limitation to speed, based on previous research looking ath top speed as a function of vertical force generated.
Bruggemann, on the other hand, looks at Pistorius' reduced horizonal force as a distinct advantage, because it means less braking force has to be overcome. Bruggemann's view on the vertical? Well, less work is done on the centre of mass, and his viewpoint, one which I agree with, is that vertical force generation is particularly important during acceleration, but once top speed is reached, it is actually better to have a lower vertical force - the disadvantage disappears. So either way, Pistorius enjoys a mechanical advantage.
What is perhaps most intriguing is that a longer contact time, a shorter aerial time and a shorter swing phase are indicative of someone who is almost "rolling" along the ground. In his 2007 study, Bruggemann found that Pistorius had a lower vertical oscillation (or up and down movevement than able-bodied runners).
The most inefficient part of running is the bit in the air - that's where gravity exerts a negative force on the athlete - followed by the landing, when energy is lost and braking forces have to be overcome. Pistorius spends almost no time experiencing this force, and mechanically, he is moving ever closer to taking part on wheels. Admittedly, that's an extreme analogy, but it's done to highlight just how different Pistorius is. What he does is NOT running. It's never been seen before, but it's not running. So when you next watch him race against able-bodied athletes, you'll be watching seven men running against someone who is not...
Final measure still unaccounted for - energy return
One final aspect that was never covered in the latest research is the aspect of energy return. This was done by Bruggemann in 2007, and you may recall that he found that the energy lost from the ankle joint of a human limb was 41.4%, compared to only 9.3% from the carbon-fibre prosthetic limb.
3. The Wrap-up
What the CAS should have known
The above is a "heavy-duty" discussion of the science presented in the Pistorius paper, and it represents a departure from the 'conversational' nature of this debate up to now. That debate and all the theory behind it is as true today as it was two years ago, and I would say it still holds the theoretical reasons for the Pistorius advantage. This post does not discuss that advantage as much as it dissects research method and study design, which is an essential part of research. However, it was never the purpose of this site to pick apart scientific methods and discussions around Standard Deviations, and so I won't go down this path here again.
Looking back, however, this is the process that should have been followed by the CAS. There is no way that this research should have been allowed to roll into Lausanne in May last year, having never been seen by the IAAF or any other scientist before being presented to the CAS. What is written above is a typical evaluation of scientific method and design, but Oscar Pistorius and his clan managed to bypass it - they ambushed the CAS with the science, and had a group of lawyers deliver a result without the stringent, essential scientific debate that science calls for.
Perhaps the issued raised above are easily addressed - I'm sure some will have "answers" to these questions, or perhaps even more questions. But they are serious questions, some that cannot, I believe, be addressed satisfactorily. For the CAS, however, they were hijacked, and I cannot believe the IAAF would stand by idly and allow that - so this is request to re-open the debate, and present the same arguments above, plus other, probably better ones, in the interests of getting a fair hearing for BOTH sides, not the hijacked hearing that it was...
All in all, the research that saw the CAS clear Pistorius is full of questions, not answers, and the CAS should have waited for this kind of opinion and discussion before throwing a verdict out. The study has too many flaws to ignore, and had any length of time been taken to actually evaluate it, instead of allowing one single day hearing, this might have been discovered.
Given that Pistorius is not actually even running as we know it, I'm not sure what debate still exists. However, to carry on that debate, the latest research published just last week is fraught with what I believe to be significant problems. Pistorius started out as a 400m runner who should have been compared to other 400m sprinters.
In the end, he was declared physiologically similar to elite and sub-elite distance runners, despite having a 17% efficiency advantage. Where he was similar, it is reported, is that his speed at VO2 peak is "essentially the same", even though he lies 5 SD outside the able-bodied average. And all this when Pistorius was, in his own words, untrained as a result of the stress of the travel and trying to prove his innocence. That is no comparison or grounds to declare similarity.
When data is selectively presented without explanation why (where are the other three sprinter's results in the fatigue tests, for example?), when timing is not accounted for, when methods are glossed over with crucial implications, and when comparisons are made between one sprint athlete and all of four sprinters and a host of distance runners, then the theoretical debate goes nowhere.
Where the IAAF research and this latest research DO agree is that Pistorius differs from able-bodied athletes mechanically. It's not running, but a never seen before form of locomotion that is heading towards rolling on wheels. That alone might have been enough to make the right decision. It wasn't, and so I continue to hope that within the next thirty years, another athlete comes along, who, with greater ability, work ethic, and talent, runs 400m in 41 seconds.
Thanks for reading the lengthy post - more general opinion is to follow, and then I hope to leave the issue behind, and prepare for the Tour de France!
P.S. If you'd like a copy of the paper, just let me know...