IUBio GIL .. BIOSCI/Bionet News .. Biosequences .. Software .. FTP

[Bio-software] Re: please help me understand BLAST tool structure and files

Des Higgins dazzhiggins at hotmail.com
Mon Jul 24 10:50:04 EST 2006


"Kevin Karplus" <karplus at cheep.cse.ucsc.edu> wrote in message 
news:mailman.270.1153223259.20007.bio-soft at net.bio.net...
>
> On 2006-07-17, Brannon <brannonking at yahoo.com> wrote:
>> I'm confused on BLAST file formats and somewhat on the BLAST tool
>> structure itself. I have no experience with BLAST, but I recognize
>> BLAST can read several input formats including FASTA.
>>
>> Assume I'm using the latest version of BLAST. It seems to me there
>> would be three file stages. First would be the input files to be
>> processed with some heuristical program. Second would be the output
>> files from that tool; these output files would also be the input files
>> to a tool that would produce the exact alignment. So the third stage
>> files would be the alignment files themselves. Is that even remotely
>> close to reality?
>>
>> What I really want to know is the file format of the stage two files --
>> the output of the BLAST tools before they do the sequence alignment.
>> Where do I get that information?
>
> There are two different versions of BLAST, with two different file 
> structures.
> There is "wu-blast" from Washington University, and NCBI Blast from NCBI.
>
> I believe that the NCBI version handles bigger databases and has been
> upgraded more assiduously than the wu-blast version.  I used to use
> both, but have switched to using exclusively NCBI blast.
>
> The formatdb command converts fasta files to a bunch of files
> (different formats for nucleic acids and proteins).
>
> I found formatdb.html on the web with the following information:
>
>    DISCLAIMER: The internal structure of the BLAST databases is
>    subject to change with little or no notice.  The readdb API should
>    be used to extract data from the BLAST databases.  Readdb is part
>    of the the NCBI toolkit
>    (ftp://ncbi.nlm.nih.gov/toolbox/ncbi_tools/), readdb.h contains a
>    list of supported function calls.
>
> (the double "the" is in the original)
>
> ------------------------------------------------------------
> Kevin Karplus karplus at soe.ucsc.edu http://www.soe.ucsc.edu/~karplus
> Professor of Biomolecular Engineering, University of California, Santa 
> Cruz
> Undergraduate and Graduate Director, Bioinformatics
> (Senior member, IEEE) (Board of Directors & Chair of Education Committee, 
> ISCB)
> life member (LAB, Adventure Cycling, American Youth Hostels)
> Effective Cycling Instructor #218-ck (lapsed)
> Affiliations for identification only.
>

I am not sure I can trust information posted on usenet by a lapsed Effective 
Cycling Instructor.

Des Higgins
lapsed taxonomist and lapsed thin person





More information about the Bio-soft mailing list

Send comments to us at archive@iubioarchive.bio.net