Tag Archives: bibliography

“Avalanche control” in scientific literature: A role for informatics

In “We Must Stop the Avalanche of Low-Quality Research,” it is claimed that much of the scientific research literature published recently is “redundant, inconsequential, and outright poor” and that “research has swelled in recent decades, filling countless pages in journals and monographs.” “Countless” is intended in a negative sense here. No argument is provided for the first claim, unless the claims about frequency of citation—generally very low, if at all, for any paper in the literature—are to be taken as an argument that recent literature is poor in quality. It does seem clear that the authors believe that there is too much literature, and it seems to me that their claims and arguments that there is too much literature might be just as strong if it weren’t paired with the argument that the literature is generally low in quality.

Taking a larger view, the problem is probably worse than the “Avalanche” authors suggest. A prominent case in point: the Biodiversity Heritage Library, whose holdings amount at present to 30,512,292 pages in 80,976 volumes, is growing daily, and more and more libraries are joining the project, including those in Europe and the Pacific rim. (Perhaps the “Avalanche” authors would find this reassuring. Back in the good old days, when men were real men (and women didn’t do science), only what was worth reading was published, and everyone read it.) Nonetheless, finding works relevant to a given topic is difficult and will become more so.

Continue reading

Creating a local copy of PubMed

I just updated a previous post about my plans to create a local copy of PubMed with a few more remarks about how I intend to proceed. I hope to have that work completed in the next two or three weeks. Check it out here: https://shiftingbalance.org/?p=244. It will be a challenge to integrate the various data sources, including PubMed, our previously existing BibTeX database, and records in MARC format with one another. Semantic interoperability will be the greatest challenge.

I have to learn a little about Python and mySQL. . .

Lindsay Cowell, who works on the Infectious Disease Ontology, was kind enough to share a script her group used to create a local mySQL database of PubMed records. It’s in Python, and Changqing Li wrote it. In order to use the script, I have to learn about Python and also more about sql databases.  I found a Python tutorial, written by the creator of Python himself! Most likely I will rely on the kindness of my friends to help coach me through the Python scripting, and once it’s in mySQL form, I can take it from there. There’s an issue about where the data is going to go, but Chanqing tells me that the whole thing requires at least 60GB. I hope it doesn’t come to this, but the best (cheapest) course of action might be to create the database on a local disk.  The records I’ll eventually want to incorporate into the Literature of Evolution database will not, I imagine, require so much disk space. I have a few GB’s at geekisp, and maybe the Darwin Manuscripts Project can spare a few.