GCG-Lite: A Web Interface to the GCG Sequence Analysis Package

by Peter FitzGerald, Ph.D., DCRT
(e-mail: pcf@helix.nih.gov)

The past year has seen the coming of age of the electronic information era with the exponential rise in the popularity and functionality of the World Wide Web. Although accompanied by considerably less fanfare, the ability of the Web to provide a universal interface to many computational tasks has ignited a revolution in scientific computing. One area that has been affected dramatically by this revolution is the field of DNA- and protein-sequence analysis.

Taking advantage of this new-found power to connect many types of computers across many types of computing platforms, I have recently developed a Web interface to give NIH researchers better access to what many regard as the industry standard for sequence-analysis software, Genetics Computer Group Inc.'s GCG Wisconsin Sequence Analysis Package. This interface, called GCG Lite, provides intramural scientists with rapid, easy access to a powerful set of computational tools running on centrally maintained, high-performance computers.

Pre-Web Options

In the "pre-Web" world, a scientist had two main choices when it came to sequence analysis: local computing or central computing. Both options had pros and cons. The local computing approach typically involved evaluating, purchasing, installing, and running a sequence-analysis package on local, desktop computers, typically a personal computer (PC) or Macintosh computer. The main attractions of the local computing option were the relative ease of use of such programs. However, these benefits were often offset by the limited computational power available on desktop computers, combined with the need to continually maintain and update the software and associated databases. Additionally, because of significant costs and restrictive licensing, such software has generally been accessible only through a subset of computers available to research staff, resulting in bottlenecks and competition for the "analysis computer." The second approach - the central computing option - has traditionally presented a less user-friendly environment than the desktop computing model, requiring operational knowledge of a telecommunication package, the UNIX operating environment, and the GCG software itself. However, this model has proven adequate in addressing the needs of many in the NIH sequence-analysis community. The attractiveness of central management of software and databases, the functionality of the software, and the computational power of a large UNIX-based machine have generally offset the hurdles presented by the user interface. In fact, during the past year, more than 650 intramural researchers have used the GCG sequence-analysis software running on the DCRT-maintained, UNIX-based Helix system. The local and central computing options are not mutually exclusive, however, and many NIH labs have opted for some combination of both.

Best of Both Worlds

With the introduction of the Web interface to sequence-analysis software, biomedical scientists may now more easily avail themselves of the "best of both worlds." A brief tour of the Internet, starting at the Web page found at the uniform resource locator (URL) http://molbio.info.nih.gov/molbio/ leads to a wide array of sequence-analysis tools, including NIH's own GCG-Lite.

Before delving into the details of GCG-Lite, let's briefly review the features of the complete GCG Wisconsin Sequence Analysis Package. Consisting of more than 120 individual analysis programs, the full GCG software package is typically operated from the command-line on a central system or via the X-Windows graphical interface. Additionally, each GCG analysis program comes with an extensive array of optional command parameters that, although very powerful in the hand of an expert, are daunting to the less-experienced user.

To create a Web interface to a subset of GCG's impressive suite of sequence analysis programs [see box], I wrote a collection of programs and hypertext markup-language (HTML) forms, dubbed GCG-Lite. The new interface can be reached by using a Web browser program to access the NIH Home Page, which is located at the URL http:///www.nih.gov/ and then sequentially clicking on the links to Scientific Resources and Molecular Biology. For a more direct route to GCG-Lite, go straight to the following URL: http://molbio.info.molbio/gcglite/


Perhaps the foremost of GCG-Lite's features is the ease of access by NIH researchers. As a Web based application, GCG-Lite provides a uniform interface to anyone with network access, regardless of the type of computer they use, be it Mac, PC, or UNIX work station. In developing GCG-Lite, I took into account the way NIH researchers have used the full GCG package on the Helix system over the past few years. For example, feedback from the scientific community prompted the creation of both novice and expert modes for all analyses. In the novice mode, the user simply provides a sequence, selects an analysis function, and launches the analysis. In the expert mode, the user has more control over certain parameters that may affect the analysis. By incorporating the sequence-format translator, "readseq," developed by D.G. Gilbert of Indiana University in Bloomington, GCG-Lite is capable of accepting a wide variety of input sequence formats. Additionally, because sequence input and formatting is inherently an error-prone procedure, all GCG-Lite output includes a copy of the sequence analyzed, thus providing a check of data integrity.

Analysis functions that display data as graphs have been notoriously difficult for researchers to use effectively through the regular GCG command-line interface. In contrast, GCG-Lite takes advantage of the multimedia-handling capabilities of Web browsers to support the output of graphs in both GIF and Postscript formats.


GCG-Lite does not provide access to the complete set of GCG programs. Thus, scientists who require access to many of the less popular but powerful features of GCG are still best served by the full GCG package. In addition, it should be remembered that GCG-Lite's functionality is largely determined by the operation of the Web browser - and few of today's browsing programs operate correctly in every situation. That means researchers can only expect GCG-Lite to function as well as the Web browser they are using.

To Use -

GCG-Lite's set of sequence-analysis tools should be particularly attractive to researchers who have not learned to use the full GCG software on the NIH Helix computer or those who seldom use sequence-analysis programs and are thus likely to forget the appropriate syntax necessary for using GCG on Helix. In addition, all intramural scientists involved in DNA- and protein-sequence analysis should be attracted to GCG-Lite because of its ease of use, especially the way it enables a researcher to readily view the results of altered program parameters and to produce graphs.

...Or Not to Use?

Among the researchers who are the least likely to find GCG-Lite useful are those who are already familiar with the command-line interface of the full GCG software, who are comfortable in the UNIX operating environment, or who require access to the GCG analysis modules not incorporated into GCG-Lite. Furthermore, scientists who rely on the data-management and -integrity features provided by the full GCG software operating on Helix should be aware that GCG-Lite does not provide those features because it is purely an interface to analysis functions.

Unlike many Web applications, GCG-Lite is not generally accessible to the greater Internet community. To comply with GCG software-licensing restrictions and internal DCRT policy, access to GCG-Lite is restricted to computer users on the NIH network. In the future, access to this software may be further restricted to researchers with Helix accounts.

Looking Ahead

Future enhancements to GCG-Lite will include the incorporation of additional GCG analysis modules and the expansion of the analysis functions to include software outside the GCG suite. And that's not all. With the predicted improvements in the functionality of Web browsers and extensions to the basic Web protocol, it's reasonable to expect that in the near future, biomedical researchers may be doing most - if not all - of their data analysis via World Wide Web interfaces such as GCG-Lite.

What Can GCG-Lite Do?

Text-word database searches for DNA and protein sequences

Restriction-enzyme-site identification

DNA-to-protein translation

PCR-primer prediction

Protein-to-DNA backtranslation

Protein-isoelectric point (pI) prediction

Identification of protein motifs within a protein sequence

Protein-structure prediction

Prediction of protease-digestion patterns

Graphical dot-plot comparison of two DNA or protein sequences

Local or global homology comparison of two DNA or protein sequences

Return to Table of Contents