DHHS, NIH News  
National Library of Medicine (NLM)

Tuesday, June 10, 2003

Robert Mehnert
or Kathy Cravedi
NLM Public Information
(301) 496-6308

The National Library of Medicine Defines Standard Content Model for Electronic Archiving and Publishing of Journal Articles

Bethesda, Maryland — The National Library of Medicine (NLM) announces the creation and free availability of a standard model for archiving and exchanging electronically journal articles.

Since the mid-1990s, scholarly journals have been striving to make their content available on the web for greater distribution, ease of searching and retrieval, or just to have a web presence. "These electronic files are created to meet the needs of the Internet — usually without much thought given to long-term archiving of the content," says Dr. David Lipman, Director of the Library's National Center for Biotechnology Information (NCBI). "Today we release two Document Type Definitions (DTDs) that will simplify journal publishing and increase the accuracy of the archiving and exchange of scholarly journal articles."

NCBI created the Journal Publishing DTD to define a common format for the creation of journal content in XML. The advantages of a common format are portability, reusability, and the creation and use of standard tools. Although the Publishing DTD was created for electronic production, the structures are robust enough to support print publication as well.

Built using the same set of elements, the Archiving and Interchange DTD also defines journal articles, but it was created to provide a common format in which publishers, aggregators, and archives can exchange journal content.

These DTDs and the Tagset from which they were created are in the public domain. Complete information and documentation can be found at http://dtd.nlm.nih.gov.

To keep the DTD relevant to the publishing and archiving communities, NLM is creating under the PubMed Central Advisory Committee an XML Interchange Structure Working Group to advise on recommended changes and additions to the Tagset.

NCBI will encourage the use of the Publishing DTD to define the incoming data for PubMed Central (PMC; http://www.pubmedcentral.gov) for journals that do not already have content in SGML or XML. PMC is NLM's digital archive of life sciences journal literature.

"We didn't start out to create a standardized archiving format for articles," says Jeff Beck of the NCBI. "We were starting a major revision to our DTD at the same time that a company — Inera, Inc., of Newton, Mass. — was working on the 'E-Journal Archival DTD Feasibility Study' for the Harvard University E-Journal Archiving Project. That study concluded that a common format for archiving was possible, but that it hadn't been defined yet. We shared our revised DTD with Inera, and it seemed like we almost had it."

In April 2002, representatives from NCBI, Inera, Mulberry Technologies, Inc. (Rockville, Md.), the Harvard University E-Journal Archiving Project, and the Mellon Foundation (supporting the Harvard project and Inera) met in Bethesda to discuss what changes needed to be made to the PMC DTD to reach the target of the common DTD format for archiving.

The conclusion was that a Tagset should be created, and archiving, interchange, and authoring (publishing) DTDs could be created from that. Mulberry Technologies and Inera examined thousands of articles from hundreds of journals and dozens of journal DTDs to be sure that the content models being defined by the Tagset were comprehensive. After this extensive modeling exercise, the consultants worked with NCBI to create the Archiving and Interchange DTD. NCBI and Mulberry then created the Journal Publishing DTD to help publishers who had not yet selected a format for their electronic content.

NLM is planning to create other DTDs from the Tagset, including one for textbooks and one for online documentation. Because all of these types of publications will be tagged using the same elements and attributes, publishing tools created for the Tagset will be applicable to all of these document types. This confluence of tagging models will greatly simplify the publication and archiving of content at the National Library of Medicine and in the journal publishing industry in general.

Inquiries about the DTD may be directed to Jeff Beck (telephone (301) 435-5992; fax (301) 480-0109).

The National Library of Medicine is a part of the National Institutes of Health, an agency of the U.S. Department of Health and Human Services.

NIH logo   Home > News & Events
Subscribe to receive future NIH and HHS press releases