The Least Publishable Unit and Open Notebook Science
C&EN recently highlighted Jean-Claude Bradley's work with Open Notebook Science. Perhaps the most interesting part of the piece was how Bradley's group tackled the concern most scientists have about being "scooped":
Most researchers who express reservations about open science are worried about protecting intellectual property and avoiding being "scooped" on a project he [Bradley] says. Because of these complications, Bradley expressly chose an area where he thought he could make a contribution while still making his data publicly available at every step. "If someone uses my solubility data, that's something I can get recognition for without competing in the same way that many other projects would require me to," he says.
This is a very interesting idea. It says that there's a category of results that you want to safeguard until you're ready to publish, and then there's another category of results. A kind of result you might be willing to share - perhaps instantly, because it's more valuable when aggregated and doesn't affect the main flow of your research.
This second category of results might be a solubility measurement, as is the current focus of the Open Notebook Science Solubility Challenge. But this is just one example. Here are some others:
- Melting Points
- Boiling Points
- Optical Rotations
- Conditions for Resolving Enantiomers on a Chiral Column
- Raw NMR Spectra
- Raw IR Spectra
- Refractive Indices
None of these results by themselves meet the standards for Least Publishable Unit (sometimes also called the "Minimum Publishable Unit") in most journals. This explains, in part, why this kind of data can be so difficult to find and aggregate. It also explains, in part, why X-Ray crystal data, which by themselves do meet Least Publishable Unit status, are so plentiful and well-aggregated.
But imagine a situation in which every piece of data a chemist might collect about a compound was a Least Publishable Unit. A Journal of Really Small Results, if you will.
This may seem impossible because most of us are used to thinking of one way to communicate scientifically - through a journal that produces a print version and which is run by a big publisher. However, the Web makes this kind of thing not only possible, but in all likelihood inevitable.
A traditional journal format for such a system would almost certainly not work. But imagine a system that made it easy to both submit and find small, well-defined pieces of data for particular compounds. A system complete with peer-review, a recognition system, and a standard way to refer to each result.
If this sounds familiar it's because the the Web makes it very easy to assemble, query, and maintain large quantities of small pieces of information.
The principle is simple: minimize the Least Publishable Unit, and you'll get a lot more data being published. All that's missing are the systems to make it happen.


MolBank is doing something like that... what is your opinion on that 'journal'?
@Egon, Molbank is interesting for a couple of reasons. One is that they've worked on reducing the Minimum Publishable Unit (MPU) while keeping much of the same form and procedures of a traditional journal. Another is that they were one of the first (if not the first) chemistry journal to adopt a liberal open policy on reproduction of content. They also were the first to my knowledge to include machine-readable chemical structures as part of their content.
But I'm not sure how well the traditional journal model would hold up if the MPU were to shrink further still.
Rich - thanks for bringing attention to the fact that there is a role for micropublication (I would even say nanopublication :) in chemistry. What is interesting in your list is that most of those properties are commonly available - if not as actual measurements then at least from predictive software. Solubility in non-aqueous solvents is a really useful property for doing organic chemistry that is still hard to find.
In a very real sense our solubility data are "peer-reviewed" via the judges of the ONSsolubility project. If we find a value that does not make sense we flag it in the GoogleSpreadsheet as "DONOTUSE" and it is removed - with an explanation.
As for the data that are left there - enough information is provided in the lab notebook pages to assess if you want to use the measurements or not. But if the measurements are consistent from different people and methods it is likely that you can use them. For example: http://oru.edu/cccda/sl/solubility/ugidata.php?solute=benzoic%20acid&solvent=THF
There are many ways to find the solubility data but the most convenient is probably via Rajarshi's web query tool. From a practical standpoint the use of a GoogleSpreadsheet with the Google Visualization API does work fairly well at this point.
As for peer review in traditional journals we're about to submit a paper with the results to date - so we're not sacrificing one system for another.
Another option for the other types of data (spectra, m.p., etc.) is to upload to ChemSpider. Or consider ChemSpider Journal for peer-reviewed publication.
@Jean-Claude,
I like the term "Micropublication" to describe this concept - it draws an analogy to "microblogging" through services like Twitter, which was widely-criticized when first introduced (and still is to some extent) as too simple to be useful.
I'm not so sure that the compound properties on the list are all that commonly available. Some of them can, for example, be found in peer-reviewed publications, but there's a vast body of data that never makes it into that form because it's not part of an MPU. There are a lot more of those kind of data and for all intents and purposes, they're invisible. Those are the kind of data I'm wondering about capturing.
Although general-purpose tools might fill an immediate need, I'm wondering about the possibility that services built with the sole purpose of capturing, organizing, and distributing micropublication-quality chemical data might have a role.
Sort of like Twitter has the sole purpose of capturing and distributing microblogging events.
Rich, I can usually find properties like density, b.p., m.p., etc. of a compound from the Sigma-Aldrich site or - if I have to - look up its predicted b.p. or density on ChemSpider. But finding the solubility of benzoic acid in THF isn't there - predicted or measured.
I do see what you mean about a system that would be customized for micropublication. Rajarshi and Andy have essentially built a workable distributed system but it could be interesting to experiment with other approaches. The beauty of Open Data is we can test out as many systems as we want in public.
Many minds think alike: Rich, I read this post some time ago, but didn't see the comments. I just found them while Googling for an essay I wrote 18 months ago about "micropublication", at http://michaelnielsen.org/blog/?p=257 The recent work on the Polymath Project (at gowers.wordpress.com) shows the power of micropublication very vividly, I think, at least in mathematics.