An Apology of NEJM Editorial

by @raimondiand

There is a debate ongoing that targets the latest NEJM editorial [1]. Following this publication an hash had spread across twitter (and beyond); this is #IAmAResearchParasite. The protest against Dan Longo and Jeffrey Drazen speaks against the idea that “researchers that share data are parasites”; and idea that, according to many, summarised the focus and vision of the editorial.

So to help Roy understand what is this all about I will assay two editorials. The ICMJE and the now infamous NEJM.  I want to show in what follow that the protest against NEJM is not warranted and that, while the hash shows that researchers and the academic world are starting to share a common sensibility toward data sharing, #IAmAResearchParasite protest miss the point: create better organisational model to implement data sharing in academic publishing domain. I will now start with The ICMJE Editorial and then deepen into #IAmAResearchParasite concerns in The NEJM Editorial.

The ICMJE Editorial

At the beginning of the year The International Committee of Medical Journal Editors (ICMJE) published an editorial on the benefits of data sharing [2]. Now this is something you’d love to have in your morning reading folder if you were engaged with open access. The editorial, being a sort of manifesto for data sharing of clinical trials, focus on benefits and demand for participation. So, for instance we have clear statements about requirements. First:

As a condition of consideration for publication of a clinical trial report in our member journals, the ICMJE proposes to require authors to share with others the deidentified individual-patient data (IPD) underlying the results presented in the article (including tables, figures, and appendices or supplementary material) no later than 6 months after publication. The data underlying the results are defined as the IPD required to re- produce the article’s findings, including necessary metadata.

And Second:

The ICMJE also proposes to require that authors include a plan for data sharing as a component of clinical trial registration. This plan must include where the researchers will house the data and, if not in a public repository, the mechanism by which they will provide others access to the data.

There are some benefits the editorial enlighten:

Sharing data will increase confidence and trust in the conclusions drawn from clinical trials. It will enable the independent confirmation of results, an essential tenet of the scientific process. It will foster the development and testing of new hypotheses. Done well, sharing clinical trial data should also make progress more efficient by making the most of what may be learned from each trial and by avoiding unwarranted repetition. It will help to fulfill our moral obligation to study participants, and we believe it will benefit patients, investigators, sponsors, and society.

That’s great. But ICMJE, correctly as far as I’m concerned, also points out that:

those who generate and then share clinical trial data sets deserve substantial credit for their efforts. Those using data collected by others should seek collaboration with those who collected the data. However, because collaboration will not always be possible, practical, or desired, an alternative means of providing appropriate credit needs to be developed and recognised in the academic community. We welcome ideas about how to provide such credit.

To echo this invitation numerous journals published the same editorial:

Annals of Internal Medicine, British Medical Journal, Canadian Medical Association Journal, Chinese Medical Journal, Deutsches Arzteblatt (German Medical Journal), Ethiopian Journal of Health Sciences, JAMA (Journal of the American Medical Association), Nederlands Tijdschrift voor Geneeskunde (The Dutch Medical Journal), New England Journal of Medicine, New Zealand Medical Journal, PLOS Medicine, Revista Medica de Chile, The Lancet, and Ugeskrift for Laeger (Danish Medical Journal).

To summarise, basic idea is to include, as mandatory, data sharing practice as condition for publication. Also, ask researchers, in the submission process, to prepare and present a plan for data sharing. This is because data sharing practices prevents repetitions, helps hypothesis confirmation and, optimistically, helps develop new ideas or research trend. Given the value of data sharing, either this is achieve via collaboration of some sort, or not. This is what ICMJE seems to assume. The, they are asking for is a proposal on how to effectively give credit to researchers for their sharing activities. Let’s turn to NEJM now.

The NEJM Editorial

To begin with, let me sample some of the criticisms putted forward by researchers and activists. A quick look at #IAmAResearchParasite will help understand what they claim to be wrong in the editorial:

  1. The idea that researchers that re-use and share data are parasites.
  2. The idea that data scientists are parasites.
  3. The idea that the authors are against openness in science.

Lets start with (3). This accusation is simply unwarranted. If we read the editorial carefully we can easily note that this is not the case. This is the introductory part:

The aerial view of the concept of data sharing is beautiful. What could be better than having high-quality information carefully reexamined for the possibility that new nuggets of useful data are lying there, previously unseen? The potential for leveraging existing results for even more benefit pays appropriate increased tribute to the patients who put themselves at risk to generate the data. The moral imperative to honor their collective sacrifice is the trump card that takes this trick.

the conclusion:

(3)How would data sharing work best? We think it should happen symbiotically, not parasitically. Start with a novel idea, one that is not an obvious extension of the reported work. Second, identify potential collaborators whose collected data may be useful in assessing the hypothesis and propose a collaboration. Third, work together to test the new hypothesis. Fourth, report the new findings with relevant coauthorship to acknowledge both the group that proposed the new idea and the investigative group that accrued the data that allowed it to be tested. What is learned may be beautiful even when seen from close up

As far as I’m concerned, here’s what is going on. Longo and Drazen are complying with the ICMJE proposal, though not explicitly. They are clear about the value of data sharing and, according to the former proposal; more, they are also addressing ICMJE invitation on how to give credentials to academic for data sharing. Their focus is, indeed, how would data sharing work best [3]. Prima facie (3) is unwarranted

Lets be clear about what this editorial represent. It is a small presentation of an editorial product that, while complying with ICMJE requirements for open data, also proposes a model of implementation. What they claim is that to get the most out of sharing we need to have some detail guidance, or methods, to implement openness principle into research activity. Vol.374 is such a product.

Maybe we may want to expand NEJM proposal. We move, then, to accusation (1), the idea that researchers that re-use and share data are parasites. This is, as far as I see, uncharitable. I quote the most debated part, which is a second horn authors present as a challenge to data sharing practices.

A second concern held by some is that a new class of research person will emerge — people who had nothing to do with the design and execution of the study but use another group’s data for their own ends, possibly stealing from the research productivity planned by the data gatherers, or even use the data to try to disprove what the original investigators had posited. There is concern among some front-line researchers that the system will be taken over by what some researchers have characterized as “research parasites.”

This issue of the Journal offers a product of data sharing that is exactly the opposite. The new investigators arrived on the scene with their own ideas and worked symbiotically, rather than parasitically, with the investigators holding the data, moving the field forward in a way that neither group could have done on its own.

The accusation (1) is uncharitable. Authors are just reporting an attitude that characterise academic environment, useful information if you need to implement policies effectively. According to my previous working experience in opendata management I can say that the ability to understand the environment in which new processes have to be implemented is a mandatory requirement [4]. I can tell of many opendata projects that had failed because of their inability to exploit this element. To accuse them of a claim that they report is the first step toward the aforementioned mistake –the same mistake we make when confuse  whistleblowers for the problems they point to. I think this is a case in which we should point at the moon, not at the finger.

To tackle accusation (2) we will dig the editorial one last time. The accusation spread form a professional category whose activities involved dataset for different purposes (such as learning, analysis and the like). To consider their activity to be parasitism looks like an insult to me, as I’ve been, professionally, one of them. But is that the case? I believe not.

However, many of us who have actually conducted clinical research, managed clinical studies and data collection and analysis, and curated data sets have concerns about the details. The first concern is that someone not involved in the generation and collection of the data may not understand the choices made in defining the parameters. Special problems arise if data are to be combined from independent studies and considered comparable. How heterogeneous were the study populations? Were the eligibility criteria the same? Can it be assumed that the differences in study populations, data collection and analysis, and treatments, both protocol-specified and unspecified, can be ignored?

Longo and Drazen’s first concern seems to be about the implementation details of ICMJE proposal. According to my experience, guidelines and handbooks are declaration of duties, moral in this particular case. Be we need more than this to make progress and accomplish such duties. We need better models and methods. That’s why we should care about the details. The worry seems to be that, without a necessary sharing of knowledge among professionals, data sharing might not generate effective value. And this “Holding data”, according to Longo and Drazen, also means having specific knowledge about how they had been obtained. This is indeed the quick picture:

Start with a novel idea, one that is not an obvious extension of the reported work. Second, identify potential collaborators whose collected data may be useful in assessing the hypothesis and propose a collaboration. Third, work together to test the new hypothesis. Fourth, report the new findings with relevant coauthorship to acknowledge both the group that proposed the new idea and the investigative group that accrued the data that allowed it to be tested.

The key point of NEJM proposal, then, is that one value of sharing data might be finding new ways to organise research activities in such a way as to promote the expertise of each researcher in the team and give them credit, as ICMJE was advocating. Assuming this is true, (2) is unwarranted. Indeed, data scientists might count as potential collaborators too. And the more general (1) “researchers that share data are parasites” fails as well.

A quick note on two other possible issues regarding the editorial:

A) it might be contested that it is not clear what do they mean by “novel idea”. However, this again would be uncharitable, since also the ICMJE use the same expression and points to the same goal.

B) it might be said that “data should be open!!”. Yes, this is always something we share if we are activists for open access to data. However, this is a separate issue, which deserve its own room to be discussed. As I’ve already mentioned.


Overall, my understating is that general accusations are unwarranted. There is no contrast between ICMJE Editorial and the NEJM Editorial. Lets call the latter a minimum viable product that can be seen as a beta test for possible implementation of ICMJE directives. NEJM should be seen as a challenge; also, as a difficult one, since it forces organisations and team to open their knowledge, share goals, and cooperate in their activities. Activists and researchers should be happy about this implementation, since values data sharing in terms of open access to knowledge and growth in professional expertise. But they should be also aware that this is by no means easy.

Personally, I do share the same concerns about details, for I’m sick of promising projects sinking because of lack of strategic guidance. As OSD we published something on this line, and it has been called Masterplan. Have a look. Because if you want to comply with ICMJE a data sharing plan is required from you. You have one year, or so:

This requirement will go into effect for clinical trials that begin to enroll participants beginning 1 year after the ICMJE adopts its data-sharing requirements.

(note: the ICMJE plans to adopt data-sharing requirements after considering feedback received to the proposals made here)


[1] NEJM, January 21, 2016  Vol. 374 No. 3. The editorial can be found here.

[2] Sharing Clinical Trial Data: A Proposal From the International Committee of Medical Journal Editors – January

[3] apparently Drazen follow-up on the debate makes the commitment toward ICMJE explicit. See Drazen editorial here.

 We want to clarify, given recent concern about our policy, that the Journal is committed to data sharing in the setting of clinical trials. As stated in the Institute of Medicine report from the committee on which I served and the recent editorial by the International Committee of Medical Journal Editors (ICMJE), we believe there is a moral obligation to the people who volunteer to participate in these trials to ensure that their data are widely and responsibly used. Journal policy will therefore follow that outlined in the ICMJE editorial and the IOM report

[4] A rough sketch of experiences relevant to the discussion: I spent almost 3yrs working on opendata project modelling in Italy, worked with gov agencies to startup opendata projects. My last professional contribution as former employer was the G8 opendata charter. More on my previous stuff here. Now I run my own project: OSD.