Version 1.4 of the Basic Local Alignment Search Tool (BLAST) family of sequence
database search programs is now available via anonymous FTP on ncbi.nlm.nih.gov
(IP address 126.96.36.199) beneath the /blast directory. A copy of the README
for the version 1.4 software is attached below.
National Center for Biotechnology Information / National Library of Medicine
gish at ncbi.nlm.nih.gov
Come the beginning of November, my new home will be:
Department of Genetics
4444 Forest Park Ave., Box 8501
St. Louis, MO 63108
[e-mail will be forwarded]
This is a final copy release of version 1.4 of the BLAST application programs.
This software is UNIX compatible only.
To build the BLASTP, BLASTN, BLASTX, TBLASTN, and TBLASTX programs, the
following compressed UNIX tar archives should first be downloaded and built in
the order shown:
ncbi.tar.Z -- NCBI Toolbox for UNIX (see /toolbox/ncbi_tools/ncbi.tar.Z)
gish.tar.Z -- personal function library
dfa.tar.Z -- deterministic finite-state automata function library
blast.tar.Z -- blast function library
blastapp.tar.Z -- development version 1.4 blast application programs
Note: major portions of the NCBI software Toolbox are now required for
building the BLAST 1.4 software, not just a few of the .h header files as was
the case with the 1.3 software. An ANSI C compiler is also required (gcc
should do, but it hasn't been tested). The BLAST database format is unchanged,
however -- the same old pressdb and setdb programs are used to create blastable
databases from FASTA-format input files.
Features of the BLAST 1.4 distribution:
o Karlin and Altschul (1993) "Sum" statistics is the default method used to
evaluate the statistical significance of sets of HSPs, rather than Poisson
statistics. Poisson statistics remain an option, but Sum statistics produce a
relative ordering of the database matches that makes more intuitive sense and
Sum statistics in many cases is more sensitive.
o Fewer false positive reports are anticipated, through the use of more
stringent HSP consistency rules than before. This has permitted HSP score
thresholds (S2 parameter) to be lowered somewhat for improved sensitivity
without too adversely affecting selectivity. The amount of HSP overlapping
permitted with consistent HSPs can be adjusted with the parameter -olfraction,
current default value 0.125, or 12.5% of the length of each HSP.
o The wordlength (W parameter) in BLASTN can now be safely adjusted from its
new default value of 11 down to as low as 1, to increase sensitivity but at the
expense of speed. Many users may find no need to go below W=6.
o BLASTN 1.4 uses a real scoring matrix to score alignments, instead of the
simple match-mismatch scoring done by BLASTN 1.3, so that partial matching can
be scored for ambiguous nucleotide codes (e.g., A vs. R). Of less utility,
BLASTN 1.4 also has the capacity to generate "neighborhoods" on the W-mer
words, an option that is invoked by using the T parameter. The price paid for
using matrices and shorter word lengths is that BLASTN 1.4 uses more memory and
is 30% slower or more. Be careful when using the T parameter to generate
neighborhood words -- this parameter can cause the program's memory use to
skyrocket with little trouble.
o BLASTN 1.4 uses the E2 and S2 parameters in the same way these parameters
had been used by previous versions of the other blast programs.
o A new program TBLASTX is included that uses a nt. query and a nt. database
and translates both in all 6 reading frames prior to comparison. With the 6 x
6 = 36 combinations for comparison, TBLASTX is considerably slower than BLASTX
or TBLASTN, but it may be put to good use in searching databases such as dbEST
and dbSTS with other anonymous sequences.
o BLASTP can search with multiple scoring matrices in parallel. This feature
is invoked by specifying multiple -matrix options on one BLASTP command line.
Searching with multiple PAM matrices, for example, may provide better
sensitivity in detecting similarity between proteins having domains that
evolved at different rates. Note: the p-values and expectations reported by
the program when using more than one matrix may be unduly low.
o Combinations of two or more -sort_by... options can now be used together.
o The "score vs. frequency of occurrence" histogram of the version 1.3 programs
has been replaced with an "expected frequency of occurrence vs. actual
frequency of occurrence" histogram.
o New "-asn1" and "-asn1bin" options to all of the programs cause them to
produce ASN.1 structured output ("print value" and binary encoded,
While it is recommended that users switch to using the version 1.4 programs for
the improved sensitivity and selectivity, reasons not to switch to version 1.4
might include the following:
o automated parsers may break on the version 1.4 output because it is somewhat
different from version 1.3 output. Perhaps the easiest way to test this is to
run a query through the NCBI BLAST E-mail server (blast at ncbi.nlm.nih.gov) and
see if the output returned can be successfully parsed.
o if reproducibility of results from the version 1.3 programs is highly
important, the different statistics, cutoff scores, and re-written functions of
version 1.4 can yield different results from the version 1.3 programs. The
-compat1.3 option goes a long way towards making the new programs behave like
the old ones, but not the whole way.
o over all, the version 1.4 programs are about 10% slower than the version 1.3
o the version 1.4 programs do not support parallel processing (threads) under
DEC OSF/1 yet, only under SGI IRIX and Sun Solaris 2.3 and higher. Parallel
processing under OSF/1 is supported by the version 1.3 programs and should be
available soon in version 1.4.
o BLAST3 is not included in the new distribution. This program must still be
obtained from the 1.3 distribution. Eventually, the source code for BLAST3 may
be folded into the same distribution as the 1.4 programs.
o BLASTN 1.4 does not support the "noclean" option of previous versions.