This is a guest post, sort of. Well, it’s a post that was co-authored by myself, along with two other academics – Roger Pielke Jr from the USA and Erik Boye of Norway. You may recall that last year, about this time actually, the three of us tried to look at the data that was being used by the IAAF as a key element to select the events covered by its DSD Regulations (the now accepted-by-CAS DSD Regulations).
We got hold of some of the performance data from the Bermon & Garnier paper, and when we analysed it, we found all kinds of data errors, including duplications and the presence of what we called ‘phantom times’, performances that had been analyzed even though they didn’t actually exist in the competitions looked at.
That led us to first try to call for the retraction of the paper (based on the Journal’s policy that flawed or erroneous data are grounds for retraction), and then eventually, in a separate journal, a paper that described the problems with the IAAF’s data.
The CAS decision is now in the books, and the DSD Regulations will stand despite the evidence (my assessment) rather than because of it. As I wrote here, and said a few times last week, my read of it is that the decision was the result CAS weighting the concepts and theory more heavily than the evidence that was presented.
The very best case you can make for the IAAF’s evidence, including that part of it which they have not published for medical confidentiality reasons, it that it establishes a hypothesis through observation of a convenience sample in which the collection methods are in contradiction with the IAAF’s own DSD Regulation, and where no potential confounding factors are controlled for. Their evidence might be considered a passable first step, not the last necessary one, in creating any kind of regulation.
Given that the quality of evidence still matters, and even more, the scientific process that informs governance (even in sport!), organizations have to be pulled up on misleading statements. When a policy has the ramifications of this one, both socially and for the health of those it may affect, it is all the more important that the very best be expected of the science that underpins it.
And so this guest post, written by the three of us, aims to respond to one particular point raised by the IAAF in the aftermath of the CAS decision. That point, shown below, comes from the IAAF’s Briefing Notes on its Female Eligibility Regulations:
We believe that scientific process matters. Given that some of the statements made above are misleading, some are evasive, and some are outright false, Roger, Erik and I have the following response:
Response to claims of robust, peer-reviewed research and “misplaced criticisms”
We would like to respond to several incorrect and misleading claims made by the IAAF in its Briefing Notes released on 7 May 2019. These claims relate primarily to a 2017 research paper that provided the evidence used for the selection of specific track and field events covered by the IAAF’s DSD Regulations (Bermon and Garnier BJSM, hereafter BG17).
Last year, at our request, the IAAF shared with us 25% of the data used in that study. As far as we know, we are the only researchers in the world outside of IAAF who have had a chance to reproduce a segment of their dataset and replicate parts of their analysis. Further, the IAAF analysis of this data remains the only performance data that IAAF cites as the basis for the selection of its restricted events under its DSD Regulations. The IAAF continues to assert the validity of this data even though we have shown conclusively that the data suffered from systematic errors rendering any analysis and conclusions unreliable
We are concerned because the IAAF representations of BG17 made in their recent statement are factually inaccurate and misleading. This matters not just because the flawed IAAF data and research using that data are a key element upholding the new IAAF DSD Regulations, but more broadly, this issue is a matter of fundamental scientific integrity.
In its new Briefing Note, IAAF writes:
[T]he 2017 Bermon & Garnier BJSM paper was criticised for its statistical approach. A new set of statistics were provided on a modified database (taking into account some of the criticisms raised).
This is incomplete at best and highly misleading at worst. It is true that some scholars criticized the statistical approach of IAAF (see Sonkesen et al; Menier; Franklin et al). However, the concerns about BG17 go well beyond those statistical concerns. They also include methodological considerations that no reanalysis can overcome, and perhaps most concerningly, the possible persistence of erroneous data in the database. Our reanalysis of BG17 using data provided by the IAAF, and published in ISLJ, found that in the four restricted events covered by the DSD Regulations, between 17% and 33% of the data were erroneous, including duplicate data and ‘phantom’ data that did not exist during the competitions analyszed. This was highly concerning and led us to call for the publisher to retract the paper in accordance with the journal’s own retraction policy.
The presence of erroneous data is not even in question – the IAAF subsequently acknowledged that more than 20% of the data in BG17 was flawed, and had been dropped from its subsequent analysis. This is what IAAF means above when they use the phrase “modified database.” It is noteworthy that this modified database has never been reviewed or evaluated in the same way that we were enabled to do for a portion of the original database. We thus have no assurances of whether the same number or types of errors may be present in this modified database.
Further, when the IAAF states that “other criticisms of this paper are misplaced”, they sidestep the most damaging criticism of all, that documented in our paper in ISLJ, which represents the only available external and independent analysis of their data. It has been peer reviewed, is available to anyone and is certainly known to IAAF. If the IAAF scientists disagree with our findings, why do they not enter a scientific discussion about the issue? It is scientifically dishonest to act as if our reanalysis, criticisms and concerns of the research do not exist.
The IAAF further states:
All published papers have been peer-reviewed.
This too is untrue. In response to numerous criticisms of its original study, including the highlighting of significant data errors, the IAAF re-evaluated the data and submitted a paper as a “Discussion” (in effect, a short letter) that was published in the BJSM in 2018, seeking to re-do the flawed BG17 study. It is explicitly noted in that paper that the IAAF Discussion was not sent out for peer review, but instead was reviewed solely by the BJSM editor. Internal, editorial review is not what anyone in the scientific community would characterize as “peer-reviewed.”
In our critique of the IAAF’s original study (BG17), we also looked at the analysis in the follow-up Discussion in BJSM (which we call BHKE18). We found that analysis also to be unreliable. Here is what we concluded in our paper:
Clearly and unambiguously, the results reported in BG17 change quantitatively in BHKE18 upon removal of 220 data points and introduction of new methods. The results of BG17 are clearly unreliable, and those of BHKE18 are of unknown validity. Further, without access to the medical data and all linked performances used in BG17, it is impossible to know how or why certain athletes/results were removed and others not. What is unequivocal is that BG17 used unreliable data, and thus, its results are also unreliable. Different data and methods were used in BHKE18, leading to significantly different results, based on the almost certain use of flawed data, leading consequently to unreliable results. The bottom line is that the use of flawed data makes it impossible to know what, if any, relationship exists between the variables of BG17 and BHKE18 or to verify the reported results.
The fact that IAAF themselves performed the research and analysed the data on which it has based its controversial regulations is non-transparent and problematic. They have failed to respond to our criticism, to explain what data errors have been detected and possibly corrected or to release their data for independent verification. These facts highlight the nature of the deep conflict of interest that the IAAF researchers have in this case. Here is what we stated in our paper:
The IAAF set itself up for problems by conducting research on performance effects associated with testosterone using in-house researchers. This creates at a minimum a perception of a conflict of interest that could have been mitigated to some degree by allowing independent researchers access to data and evidence, in order to replicate findings. In this case, such access was not allowed, except for the small amount of data shared with us, which was subsequently found to contain numerous errors. The unwillingness of the IAAF to correct or acknowledge errors highlights its conflict of interest.
An alternative to the approach to science and evidence employed by the IAAF would have been to provide research funding to an independent body which could request proposals from researchers unaffiliated with the IAAF to address the scientific questions at issue.11 We would not find it appropriate for cigarette companies to provide the scientific basis for the regulation of smoking or oil companies to provide the scientific basis for regulation of fossil fuels. Sport regulation should be held to the same high standards that we expect of researchers in other settings where science informs regulation and policy.
We believe that a comparison of the statements made by IAAF with our analysis can only conclude that IAAF is failing to uphold basic standards of scientific integrity that should be expected in such an important matter that affects global sport and individuals’ lives. We should all expect better.
Prof Roger Pielke Jr
Prof Erik Boye