IUBio GIL .. BIOSCI/Bionet News .. Biosequences .. Software .. FTP

GenBank LOCUS bugs GCG 2BIT token

gmartin gmartin at MendelBio.COM
Fri May 19 17:18:02 EST 2000

We have found an interesting bug in some GCG software that is 
triggered by a certain "LOCUS" identifier recently created in GenBank.
The GCG bug was interfering with our ability to integrate cumulative
GenBank sequence updates into our GCG datasets, so a workaround, given 
below, was devised.

May 12, 2000, the LOCUS identifier "CNS02BIT" was introduced into
the GenBank nucleotide database.  By coincidence, past GCG programmers
devised a DNA sequence compression scheme that they named "2BIT".  
The presense of the sub-string "2BIT" in the LOCUS "CNSO2BIT" is mis-
interpreted by some GCG programs and causes them to switch into "2BIT" 
mode.  Because the sequence was not actually formatted in 2BIT mode, 
GCG programs that have this bug (such as "seqcat" and "tofasta")
obtain erroneous base counts when they attempt to read the sequence.  
These GCG programs terminate at that point and the error message produced

        *** ERROR in SQNext. Sequence reading is out of synch.

As a workaround, until GCG can fix the problem in their software library
and redistribute the programs, we now scan our GenBank cumulative update
files (e.g., gbcu.flat) for LOCUS records that have the "2BIT" sub-string.  
Where found, it is replaced with a string of four Z's, i.e, "ZZZZ".
This is simply an expedient to prevent GCG programs from failing.

Here are two examples of the workaround that a GCG system administrator
can use during processing of GenBank update files:

  A conservative approach (the one we use):

    cat gbcu.flat
    | perl -e 'while(<>){if(m/^LOCUS/ && m/2bit/i){s/2bit/ZZZZ/ig}print}'
    | genbanktogcg

  An easy-to-implement but less stringent approach:

    cat gbcu.flat
    | sed -e 's/2BIT/ZZZZ/'
    | genbanktogcg

There are other ways to do this -- but the idea should be clear from
the above.  We hope this information is of some help to you and your
user community.

Garry Martin
Mendel Biotechnology, Inc.
gmartin at mendelbio.com


- gttaacaattaaagagtgtttatcgaaattcattatatagtggtttatatagaccacttc
- GenBank newsgroup see: http://www.bio.net/hypermail/genbankb/       
- GENBANKB e-mail: messages sent to genbankb at net.bio.net
- subscribe: e-mail biosci-server at net.bio.net with: subscribe genbankb
- unsub: e-mail biosci-server at net.bio.net with: unsubscribe genbankb      
- GenBank on the WWW, see:  http://www.ncbi.nlm.nih.gov/Genbank/
- problems with GENBANKB? E-mail moderator: francis at cmmt.ubc.ca                  

More information about the Genbankb mailing list

Send comments to us at archive@iubioarchive.bio.net