Tag Archives: mySQL

Creating a local copy of PubMed

I just updated a previous post about my plans to create a local copy of PubMed with a few more remarks about how I intend to proceed. I hope to have that work completed in the next two or three weeks. Check it out here: https://shiftingbalance.org/?p=244. It will be a challenge to integrate the various data sources, including PubMed, our previously existing BibTeX database, and records in MARC format with one another. Semantic interoperability will be the greatest challenge.

BibTeX at the Darwin Manuscripts Project and BHL

Poking around the Biodiversity Heritage Library Tools page, I came across this question from the FAQ:

Question: What is the BibTex format that I see as a download option?

BibTex (http://www.bibtex.org/) is a common format for citations/references and is supported by all the major software vendors (EndNote, RefWorks, Zotero, Biblio). This functionality that lets a user view & export a BibTex file for any title, including its items, from the bibliography page, as in this example:
http://beta.biodiversitylibrary.org/bibliography/1102

BHL is also going to make this format available for download alongside our custom data exports, such that users can download a BibTex file
that contains 1) all the *titles* in BHL including links to each, and 2) all the *items* in BHL (each volume) along with links. We need this export to move title-level metadata from the BHL portal to the article repository, so thought we might as well make the file available for others to use.
In effect, this would put BHL titles & volumes in a format easily understood by existing reference management applications.

When deciding whether our big database of works about evolution at the Darwin Manuscripts Project would use Endnote or BibTeX managed by way of BibDesk, I opted for BibTeX—a smart decision, if I do say so myself. It’s served us well in the many years that we’ve been using it, and it looks like it will continue to be useful. Nelson Beebe is developing (or has completed development) on some scripts to represent BibTeX databases in my SQL tables. He provides some useful links to related software tools which are needed as adjuncts to his scripts. In a paper in TUG (forthcoming? in vol 30, issue 1, Nov 2009), he explains a little bit about BibTeX, relational databases, and what’s involved in representing a .bib file as a relational database.

If anyone out there’s had experience creating relational databases from .bib files, feel free to comment on this post, or to let me know how I can contact you to ask questions and listen to any tips, warnings, etc. you might have.

I have to learn a little about Python and mySQL. . .

Lindsay Cowell, who works on the Infectious Disease Ontology, was kind enough to share a script her group used to create a local mySQL database of PubMed records. It’s in Python, and Changqing Li wrote it. In order to use the script, I have to learn about Python and also more about sql databases.  I found a Python tutorial, written by the creator of Python himself! Most likely I will rely on the kindness of my friends to help coach me through the Python scripting, and once it’s in mySQL form, I can take it from there. There’s an issue about where the data is going to go, but Chanqing tells me that the whole thing requires at least 60GB. I hope it doesn’t come to this, but the best (cheapest) course of action might be to create the database on a local disk.  The records I’ll eventually want to incorporate into the Literature of Evolution database will not, I imagine, require so much disk space. I have a few GB’s at geekisp, and maybe the Darwin Manuscripts Project can spare a few.