A collection of programs based on Minoru Kanehisa's SEQF library search
codes is now available for public use. Directions for retrieval from the
UH Gene-Server are given below; they will appear at IUBIO, the EMBL
File Server and FUNET shortly.
The codes have been worked over and now run on most Unix boxes, Crays,
and VMS. Much work has been put into making the code as portable as
possible. This does not (yet) extend to DOS compilers, though. Don't
ask about the Mac until say 5 years after System 7 comes out...
The programs are:
This package consists of four programs for searching genetic sequence
SN - Search Nucleotide, D/RNA query sequence against a nucleotide
SP - Search Protein, amino acid query sequence against a protein
ST - Search Translated, amino acid query sequence against a nucleotide
sequence library with 3-frame translation;
SPR -Search Protein Reduced, amino acid query sequence against a
protein sequence library, with the 20 aa alphabet reduced to 6
letters on charge, hydrophobicity, and size characteristics.
SU - Search Unformatted, SN specially I/O hacked for the Cray which
requires some care and feeding, partially documented in the code.
It is about 55% faster than SN for the same problems.
These codes can be used to compare two sequences against each other;
the underlying algorithm is the Needleman-Wunsch-Sellers metric
alignment, in distance mode.
[Yes that's 5, but SU is only usable on non-Crays without some effort.]
SEQUENCE FILE FORMATS
This code is designed to use most common formats; if you have a format
you want included contact dbd at one of the addresses below.
Supported formats include GenBank, EMBL/SwissProt, Bionet/ Intelli-
genetics/ Stanford, and straight ASCII. The code should automatically
detect the proper type. Note that GCG format and Staden code and
format is NOT supported at present. If you have GCG files, try TOEMBL
in the GCG package for sequence file format conversion.
The code was written Minoru Kanehisa while with the Theoretical
Biology and Biophysics Group, Theoretical Divison, Los Alamos National
Laboratory, I/O and other modification by Dan Davison while at LANL
and the University of Houston. Additional I/O improvements are due to
Hugh Nicholas of the Pittsburg Supercomputer Center (thanks!); some
last minute work by Ed Chen of the University of Houston. The reduced
protein code search came out of discussions with Jim Ostell, now at
the National Center for Biotechnology Information at the National
Library of Medicine (thanks, Jim!).
University of Houston Gene-Server retrieval info:
The files are available for e-mail retrieval in the Unix directory: the
send unix seqf-shar.aa seqf-shar.ab seqf-shar.ac seqf-shar.ad seqf-shar.ae
seqf-shar.af seqf-shar.ag seqf-shar.ah seqf-shar.ai seqf-shar.aj
will send all the files to you. Remove mail headers, concatenate them all
together, and run "unshar" or just "/bin/sh filename" where "filename" is
the name of the concatenated file. Then read "seqf.relnotes" for more info.
The shar file is available for anonymous FTP in menudo.uh.edu (126.96.36.199):
~ftp/pub/genbank-server/unix/seqf.shar and as split files
If you have questions, comments, flames, or even kind words about the
code, direct them all to:
Dr. Dan Davison
Dept. of Biochemical and Biophysical Sciences
University of Houston
Houston, TX 77204-5500
e-mail: davison at uh.edu (Internet)
DAVISON at UHOU (BITNET)
davison at uhnix1.UUCP (Usenet, new style)
uhnix1!davison (Usenet, old style)
74065,41 Compu$erve (rarely!)