Micropublication in Chemistry

Time for a quick calculation: divide the number of experiments you've run during your career by the number of results you've published. An experiment can be as small as a melting point determination, an attempt to separate enantiomers on a chiral column, a C-13 NMR of a compound reported in the literature, a yield for a standard synthesis, or an optical rotation determination.

Now, what's your answer?

It should come as no surprise if your answer is substantially higher than one. My own estimate is closer to at least 30:1. Of course, a significant percentage of those experiments involved a mistake on my part. Still, it's quite plausible that for every one property I've reported in the literature, there were ten others of equal quality that went unpublished.

Think about that for a second - for every piece of information you see published about a compound in the peer-reviewed literature there could be five, ten, or more of equal quality that go unpublished. What a waste - for taxpayers and philanthropic organizations, and for chemistry.

Why are we in this situation? How useful is a system that encourages the hiding of 80% or more of its content pool? But more to the point - what's the alternative?

Micropublication

In the comments section to the post titled The Least Publishable Unit and Open Notebook Science, Jean-Claude Bradley introduced the term "micropublication" into the discussion.

The earliest reference I could find to the term comes by way of Michael Nielsen, who in 2007 defined micropublication as:

Allowing immediate publication in small incremental steps, both of conventional text, and in more diverse media formats (e.g. commentary, code, data, simulations, explanations, suggestions, criticism and correction). All are to be treated as first class fully citable publications, creating an incentive for people to contribute far more rapidly and in a wider range of ways than is presently the case.

Nielsen went on to further describe the role micropublication could play in science practiced with a more open model.

The Incredible Shrinking Least Publishable Unit

Micropublication can't work with today's system of scientific publication. The reason is simple: the Least Publishable Unit is far too large. Not only this, but the process of publication requires far too much work on the part of the scientist to bother with tiny results. Finally, there's little or no incentive for scientists to bother with publishing really small results - if you can't put evidence of your work in a resume, grant proposal, or promotion request, there's really no point.

The current system of publication is centered around papers. Its Least Publishable Unit is by definition the paper. Micropublication is centered around individual pieces of data. Its Least Publishable Unit is a single piece of data. Or possibly a comment about a single piece of data.

The ironic thing about all of this is that primary research papers in chemistry are for the most part 'read', not for their insightful prose, but for their data: citations; tables; and experimental sections.

Using papers as the vehicle for transmitting scientific data may work in a paper-centered world, but it breaks down in ugly ways in a Web-centric world. We're already starting to see signs that the unfolding economic situation will accelerate the squeeze being put on traditional scientific publishers.

A Workable Micropublishing System for Chemistry

Would a micropublication system work in chemistry? Maybe. What can be said is that micropublication could only work if certain conditions were met. At a minimum, the system would need to:

  • Create real, tangible incentives to publish single pieces of data on individual compounds.
  • Make it easy to add new data, peer-review data, and locate data.
  • Make it possible to individually cite each piece of data and credit its author.
  • Reward those who improve the system and punish those who abuse it.

Micropublication may not work in chemisty. On the other hand, how would anybody know without a serious attempt?

The Least Publishable Unit and Open Notebook Science

C&EN recently highlighted Jean-Claude Bradley's work with Open Notebook Science. Perhaps the most interesting part of the piece was how Bradley's group tackled the concern most scientists have about being "scooped":

Most researchers who express reservations about open science are worried about protecting intellectual property and avoiding being "scooped" on a project he [Bradley] says. Because of these complications, Bradley expressly chose an area where he thought he could make a contribution while still making his data publicly available at every step. "If someone uses my solubility data, that's something I can get recognition for without competing in the same way that many other projects would require me to," he says.

This is a very interesting idea. It says that there's a category of results that you want to safeguard until you're ready to publish, and then there's another category of results. A kind of result you might be willing to share - perhaps instantly, because it's more valuable when aggregated and doesn't affect the main flow of your research.

This second category of results might be a solubility measurement, as is the current focus of the Open Notebook Science Solubility Challenge. But this is just one example. Here are some others:

  • Melting Points
  • Boiling Points
  • Optical Rotations
  • Conditions for Resolving Enantiomers on a Chiral Column
  • Raw NMR Spectra
  • Raw IR Spectra
  • Refractive Indices

None of these results by themselves meet the standards for Least Publishable Unit (sometimes also called the "Minimum Publishable Unit") in most journals. This explains, in part, why this kind of data can be so difficult to find and aggregate. It also explains, in part, why X-Ray crystal data, which by themselves do meet Least Publishable Unit status, are so plentiful and well-aggregated.

But imagine a situation in which every piece of data a chemist might collect about a compound was a Least Publishable Unit. A Journal of Really Small Results, if you will.

This may seem impossible because most of us are used to thinking of one way to communicate scientifically - through a journal that produces a print version and which is run by a big publisher. However, the Web makes this kind of thing not only possible, but in all likelihood inevitable.

A traditional journal format for such a system would almost certainly not work. But imagine a system that made it easy to both submit and find small, well-defined pieces of data for particular compounds. A system complete with peer-review, a recognition system, and a standard way to refer to each result.

If this sounds familiar it's because the the Web makes it very easy to assemble, query, and maintain large quantities of small pieces of information.

The principle is simple: minimize the Least Publishable Unit, and you'll get a lot more data being published. All that's missing are the systems to make it happen.