Democratizing Discovery

Michael Nielsen on the Open Science Revolution

In Squaring Off, Zócalo invites authors into the public square to answer five questions about the essence of their books. For this round, we pose questions to historian Michael Nielsen, author of Reinventing Discovery: The New Era of Networked Science.

Science has traditionally been a field of competition and secrecy, where research is guarded from rivals at all costs. But the Internet is revolutionizing the process by which discoveries are made. Nielsen explores this transformation and makes the case for why it will benefit all of us.

1) You argue that the recent rise in online collaborations and freely available data and research will redefine how scientific discoveries are made and who makes them. But doesn’t proper interpretation of this data require learned techniques and years of experience?

You’re right that interpreting the data often requires a lot of experience.

Sometimes, though, the types of experience that are useful can come from unexpected or even completely unanticipated directions. For that reason it can be very valuable to make scientific data openly available to everyone, not just the people who did the original experiment.

Consider the data for the human genome-the genetic data that describe our basic makeup. The genome data was released in 2003, but even today we understand it only very incompletely. In particular, we’re only slowly coming to understand the links between genetic variation and variations in individuals such as our different susceptibilities to conditions such as diabetes and heart failure.

We’re still in the early days of understanding the links between the genome and how human beings function. Understanding those links has been helped by the fact that the human genetic data are online, where anyone can download them. When you read in the news that so-and-so-gene is connected to such-and-such disease, chances are that the people making the discovery have no connection to the original human genome project. They’ve simply downloaded the genetic data, and then used their own skills and insight to deepen our understanding of how human beings work.

Unfortunately, outside human genetics most scientific data sets are still kept secret. Sometimes there are good reasons for that, like protecting patient privacy, or for other ethical or legal reasons. But sometimes there isn’t any good reason to keep the data secret. When that’s the case it limits the ability of other scientists and of lay people to derive valuable insights from the original data set. And that’s slowing down science.

2) The current state of scientific publishing may not promote open science, but it does ensure that what is being published is legitimate and valid. With data shared through open science, how can scientists (and the public) be sure that this information is well-researched, supportable, and provable?

In many cases, releasing open data and open code will make scientific results more reliable, because it will make it much easier for other scientists to verify and build upon (or refute) earlier claims of scientific discovery.

An example of the way open science can help increase reliability occurred late last year, when a team at NASA announced that they’d found a bacterium that incorporated arsenic in its DNA backbone. If true, this would have been an amazing finding, revolutionizing our thinking about life. They made the announcement in a paper in Science, one of the most prestigious peer-reviewed journals, and it received publicity all over the world.

However, the results didn’t seem right to some scientists. Some of them-most notably, Rosie Redfield, of the University of British Columbia-have dissected the results in the open on their blogs. And because of this work, there’s now a lot of skepticism about the NASA claims.

With that said, you’re right that this needs to thought through carefully. Raw data and code that haven’t been battle-tested will need to be marked as such, not taken as gospel!

In this we can learn from the open source programming community. Open source programmers work in the open, so they’re sometimes sharing code that isn’t yet ready for wide release. But they manage the release process carefully, making sure that the wide release contains only code that has been tested and reviewed by many, not code which is in its early stages. It’s a useful model for open science.

3) What about intellectual property? If scientists are sharing their discoveries in the open, how can they get the credit (and sometimes financial stake) they deserve? What’s to keep pharmaceutical and technology-related companies from taking open science information discovered by others and making lots of money off it?

This is a fundamental question about how we organize our society, and it’s been an issue in science for hundreds of years.

To illustrate what I mean, it’s worth thinking back to earlier instances where basic scientific discoveries were turned into commercial products or even entire industries.

In the 1920s, people like Werner Heisenberg and Erwin Schroedinger made big breakthroughs in the discovery of quantum mechanics. This probably seemed like pretty esoteric work at the time, but it paved the way for practical inventions such as semiconductors (and thus the whole computer industry) and the laser.

Heisenberg, Schroedinger, and their successors published their discoveries in scientific journals. They didn’t receive any money for publishing that work, nor compensation from later computer companies who built on that work. Yet I think we’d all agree that it worked out pretty well for everyone: they got acclaim and comfortable university salaries, and the ability to work on whatever they pleased. And, ultimately, their discoveries have benefited everyone in society.

Historically, there’s always been a split between basic research and applied research. Most of the first type is funded by the world’s governments (i.e., taxpayers!), which today spend roughly $100 billion dollars on basic research carried out in places like universities. That’s led to discoveries like quantum mechanics, the human genome, and many others. I think this publicly funded basic science should be open science.

Later, that basic research acts as the seed from which new industries grow. Those industries rely on applied research to flesh out real products, typically carried out by private companies in secret. And I have no problem with that being done in secret.

Where this gets complicated is that there’s more and more push to blur this line, and to get universities directly involved in applied research of a commercial nature.

4) Given the fact that most basic science is funded by taxpayers, do you think the shift toward open science can only be realized through political advocacy and debate?

A good example of how government can help is the human genome data. A big part of the reason the genome data is open to all today is because in the mid-1990s several major grant agencies (including the publicly funded U.S. National Institutes of Health) began requiring their researchers to share human genome data. If you were a scientist who wanted to get grant money to work on the human genome, you had to agree to share your data in close to real time, and to put it into the public domain.

This kind of openness policy can be very powerful. Similar approaches have been used to open up several other kinds of genetic data, some kinds of astronomical data, and others.

The practical impact of these open data policies is that they’ve led to thousands of scientific discoveries that would otherwise never have been made.

On the other hand, this kind of openness policy still only applies to a tiny fraction of scientific data. And such policies are still nearly unheard of for other types of scientific knowledge-things like code. And so there’s a lot of scope for government and grant agencies to help open up data, open up code, and open up other types of scientific knowledge.

5) Will open science dissolve the boundaries between scientific disciplines and encourage collaboration?

It won’t dissolve the boundaries entirely, but it might certainly change them and make them more porous!

It’s interesting to think about why the boundaries are there in the first place.

They seem kind of arbitrary, but in part they’re there because of very real and deep structures in human knowledge of the world. It’s natural when you have amazingly good theories like quantum mechanics and the theory of evolution that entire disciplines should grow up around them, exploring their consequences and applications.

For instance, physics is an agglomeration of work around a few good theories (quantum mechanics, relativity, electromagnetism, and more). Similarly, much of biology is an agglomeration around theories like evolution and so on.

The disciplinary boundaries aren’t completely arbitrary: they’re a consequence of where humans have (so far) been most successful in understanding how the world works. And we can’t just chuck that understanding out.

But part of the reason for the boundaries is also just plain academic politics.

Think about a crossover field like, say, my former field of quantum computing, which combines physics and computer science. University physics departments will say, “computer science should hire people in this area,” and computer science will say, “physics should hire people in this area.” They’re both trying to get some benefit but don’t want to pay the cost.

So in that sense the boundaries are pretty arbitrary. And I think open science really will help break down those boundaries. For instance, I’m a physicist by training, but things like blogs, Twitter, and Google Plus have helped me follow far more about what scientists in other fields are doing. I know many of my scientist friends have similar experiences. I think it’ll be great for science!

Buy the book: Skylight Books, Powell’s, Amazon

*Photo courtesy of UGA College of Ag.