Dear Scientists: This Is Why People Hate You


Illustration by Titima Ongkantong, courtesy Shutterstock

If you don’t hang out on genomics social media forums, you’ve probably missed the swirling debate about an editorial published recently in the venerable New England Journal of Medicine. In it, the journal’s lead editors discuss data sharing practices and (here’s where it gets good) refer to scientists who make new discoveries from publicly shared data as “research parasites.”

I’m delighted to report that a number of scientists, many of them from the genomics community, jumped all over this nonsense. Jonathan Eisen, a microbial specialist who always has a way with words, called the editorial “simply deranged.” There’s a great account of the incident details here. (The uproar was so intense that the editors backpedaled quickly.)

I cut my teeth observing the genomics community, where data sharing was not only accepted practice, it was codified into the principles established by the scientists themselves. They reasoned that they’d be producing so much data, they would need as many eyes and brains on it as possible. Today, important new discoveries are routinely made by scientists who round up existing sources of data, mix it into new combinations, and look at it with a fresh perspective. We call it progress. It’s already making a difference in medicine.

But genomics is just one tiny sliver of biology, and biology is a small piece of the scientific pie. In the world beyond genomics, data hoarding is often standard practice. Like the NEJM editors, many scientists believe that other researchers cannot fully understand their data, and therefore any analysis they conduct on it might be flawed. They also believe, but are less willing to say publicly, that having a set of data that no one else has access to gives them a competitive advantage in a world of limited grant funding and tenured positions. The latter argument is institutional and hard for any individual scientist to change; the former can be a sign of shoddy science.

Remember those pesky lab notebooks you had to keep in high school science class? The reason they exist is so that scientists can track, with painstaking detail, every step they take to generate a set of results. When scientists publish that work in academic journals, they are supposed to include enough of that information to allow someone else to replicate the experiment and see if the results come out the same. The very nature of the scientific process relies on researchers’ ability to make sure other people understand exactly how they produced the information in the first place. If that’s not happening, then a fundamental safety net of science has been removed, and nobody’s noticed.

More broadly, this idea that scientists are afraid other people will come along and make new discoveries from their data is the best possible argument for sharing data as broadly and as often as possible. And those people who hoard data with the intrinsic goal of preventing others from figuring out something that they haven’t gotten around to yet? Simply put, there should be no place for them in science.

  • Will Greene

    Incentives probably matter here. In an ideal world, everyone in the scientific community would be both transparent with their data and intrinsically motivated to do great research, but let’s be honest: competition is not a reality of the human condition, but also (in the right contexts) an inspiration to action. So let’s fight against data hoarding, but also reward those who produce data that others remix for the benefit of society with proper citations, awards, professorships, etc. Creating such systems of credit might help move the open data movement forward.