On Mon, 17 Jul 2006 20:34:07 -0700, Kevin Karplus wrote:
> On 2006-07-17, Brannon <brannonking at yahoo.com> wrote:
>>>> What I really want to know is the file format of the stage two files --
>> the output of the BLAST tools before they do the sequence alignment.
>> Where do I get that information?
>> There are two different versions of BLAST, with two different file
> structures. There is "wu-blast" from Washington University, and NCBI Blast
> from NCBI.
If you are looking for information regarding the internals of the formatdb
output, this page <http://blast.wustl.edu/blast/dbfmts.html> contains
pointers to both the NCBI and WU formats. As noted in Kevin's response,
the formats are subject to change and may differ slightly from the listed
The formatdb command takes FASTA input and turns it into the binary input
files for blast alignment with bl2seq or blastall (or one of several
other tools that take blast binary format input). It produces a set of
database-like files that include a sequence file, a header (sequence name)
file, and an index file. The index provides guidance when accessing the
sequence and header files. The indicated file format documentation
does a pretty good job of describing the index and sequence files, but
falls a bit short documenting the header file. If you are interested -
the indicated offset postion in the header looks something like 0x(30 80
30 80), followed by 0x1a (perhaps with some other data), then the name
length in one or two bytes, the ascii name, and finally some extra (fill?)
. Dr. Scott Harper
. Adaptive Genomics Corp.
. 620 N. Main St, Suite 103
. Blacksburg, VA 24060
. Scott.Harper at AdaptiveGenomics.com, 540-552-2700