IUBio GIL .. BIOSCI/Bionet News .. Biosequences .. Software .. FTP

[Genbank-bb] Rel150.fsa_aa missing sequence data

Garry W. Martin gmartin at MendelBio.COM
Wed Nov 16 19:31:50 EST 2005


The protein sequence file, rel150.fsa_aa.gz, dated Oct 14, 2005,
for the current GenBank release 150 contains several hundred 
fasta entries that have fasta headers but no peptide residue data.

For examples, see any of these entries in the rel150.fsa_aa.gz file:

  >gi|263833|gb|AAB24990.1| No definition line found
  >gi|263835|gb|AAB24992.1| No definition line found
  >gi|263837|gb|AAB24994.1| No definition line found
  >gi|263839|gb|AAB24996.1| No definition line found
  >gi|263841|gb|AAB24998.1| No definition line found

When we attempted to process the file with existing GCG v10.3 fasta 
file handling utilities (e.g., fastatogcg), those programs became 
confused because they assume that there will be at least one line of 
sequence data following each sequence header.  We had to remove the
null sequence entries with a preprocessing step in order to complete
the installation of release 150.

We have been processing each GenBank protein release in this way
for about seven years and this is the first time we seen this
problem.

Cordially, 

Garry Martin
Mendel Biotechnology, Inc.



More information about the Genbankb mailing list

Send comments to us at archive@iubioarchive.bio.net