“Avalanche control” in scientific literature: A role for informatics

In “We Must Stop the Avalanche of Low-Quality Research,” it is claimed that much of the scientific research literature published recently is “redundant, inconsequential, and outright poor” and that “research has swelled in recent decades, filling countless pages in journals and monographs.” “Countless” is intended in a negative sense here. No argument is provided for the first claim, unless the claims about frequency of citation—generally very low, if at all, for any paper in the literature—are to be taken as an argument that recent literature is poor in quality. It does seem clear that the authors believe that there is too much literature, and it seems to me that their claims and arguments that there is too much literature might be just as strong if it weren’t paired with the argument that the literature is generally low in quality.

Taking a larger view, the problem is probably worse than the “Avalanche” authors suggest. A prominent case in point: the Biodiversity Heritage Library, whose holdings amount at present to 30,512,292 pages in 80,976 volumes, is growing daily, and more and more libraries are joining the project, including those in Europe and the Pacific rim. (Perhaps the “Avalanche” authors would find this reassuring. Back in the good old days, when men were real men (and women didn’t do science), only what was worth reading was published, and everyone read it.) Nonetheless, finding works relevant to a given topic is difficult and will become more so.

From my point of view as a librarian and ontology designer, the “Avalanche” authors’ proposal is backwards. The correct approach to the increasing amount of literature is to develop better tools for information resource discovery in databases of bibliographic records and texts (“text-bases”). This can be achieved by a combination of semantic markup and natural language processing. Changes in reading habits—scanning abstracts to determine the relevance of a paper to a researchers’ interest—are also probably necessary. A similar problem has arisen with the deluge of data in genetics and other molecular-level pursuits, and it has been solved by the invention of computational methods, for instance, those employed in systems biology.

However exciting this challenge is to me and others in this field, I would like to call attention to a point that probably seems backward-looking to some, but which lies at the heart of research in the sciences: the purpose of the literature and the informatics tools required to successfully explore it is to present a person with information which, when interpreted and understood, can form the basis for insight. Hypotheses are formulated. They are tested by experiment and observation. The results are assessed. At each stage, people analyze, conjecture, argue, evaluate, and speculate. The extent to which a contribution to the literature promotes this process is the ultimate measure of its usefulness.

Suppose that there is a paper that is read only once, and is never cited. If that paper’s solitary reader takes something of even the slightest value from that paper, it is a worthwhile contribution to the literature. Although the case I report below concerns a contribution to the literature current at the time, I believe there is much work published in the last 150 years which have had a readership in the dozens, but which, if discovered today, would be of great importance. Scientists of the 19th and early 20th centuries are no less astute as observers, and are no less competent as experimenters, than those of today. Much of the data collected in disciplines of natural history during this time frame is invaluable today because it reflects biogeographical patterns and ecology which has since changed, offering a long-range longitudinal view. At the same time, reluctance during this period and today to publish null results has deprived us of a wealth of potentially useful information. Today’s failed experiment may be tomorrow’s success, if just one more independent variable were accounted for.

The current periodicals shelves are among the oldest and most reliable informatics tools, simple though they are. The researcher confronts the bleeding edge across many disciplines, from many places, in many languages. Rhodes [1, 145] reports the following about the origin of the cyclotron, by E. O. Lawrence.

Lawrence pursued more promising studies but kept the high-energy problem in mind. The essential vision came to him in the spring of 1929, four months before Oppenheimer arrived. “In his early bachelor days at Berkeley,” writes Alvarez, “Lawrence spent many of his evenings in the library, reading widely.” Although he passed his French and German requirements for the doctor’s degree by the slimmest of margins, and consequently had almost no facility with either language, he faithfully leafed through the back issues of the foreign periodicals, night after night. Such was the extent of Lawrence’s compulsion. It paid.

While Lawrence “was skimming the German Archiv für Elektrotechnik, an electrical-engineering journal physicists seldom read,” he came across the critical insight.

[Lawrence] happened upon a report by a Norwegian engineer named Rolf Wideröe, “Über ein neues Prinzip zur Herstellung hoher Spannungen:” “On a new principle for the production of higher voltages.” The title arrested him. He studied the accompanying photographs and diagrams. They explained enough to set Lawrence off and he did not bother to struggle through the text.


[1] Rhodes, Richard. The making of the atomic bomb. New York: Simon and Schuster (1986).

Leave a Reply

Your email address will not be published. Required fields are marked *