In article <9209232213.AA05660 at violet.berkeley.edu> bosborne at VIOLET.BERKELEY.EDU writes:
>To the Group,
>>I am a C beginner learning how to program on my NeXT (!). I thought a
>useful program for me to attempt to make would find all restriction
>sites in a sequence and print it. What sort of methods (structures, hash
>tables, lookup tables, etc.) should I study? How does a program like
>(the great) DNA Strider for the Mac perform this operation so quickly?
>I donUt mean specifically, just the principles I should pay attention to.
>As you can guess, IUm much more biologist than programmer
>>Thank you for your attention to this matter,
The string search algorithm of Knuth, Morris and Pratt (SIAM J. Comput.
6(2):323-350 (1977)) works very well for restriction site searches. It
executes in O(m+n) units of time, where m is the length of the RE site and
n is the length of the sequence. If you just slid a pattern of m bp along
a sequence of n bp, (ie. performing m comparisons at each of n sites)
the time would be O(m*n). In essence, the m*n algorithm "backs up" and
searches the same region of sequence several times over. The KMP algorithm,
at the beginning of the search, creates a lookup table for a given pattern,
that makes it possible to avoid any backup.
This algorithm is implemented in the programs BACHREST and INTREST in
the FSAP package. The source code can be found in fsap.tar.Z at
pub/psgendb at ccu.umanitoba.ca. The names of variables used in the
program correspond precisely to the variables in the paper. However,
since KMP originally required matches at all positions in a pattern, the
algorithm had to be modified to allow for ambigous nucleotides (eg.
Brian Fristensky | Unintentionally funny qoute of the month:
Department of Plant Science |
University of Manitoba | Rev. Jerry Falwell denounces Bill Clinton,
Winnipeg, MB R3T 2N2 CANADA | saying that his "misquoting and manipulating
frist at ccu.umanitoba.ca | holy scripture for political purposes should
Office phone: 204-474-6085 | be offensive to millions of Americans."
FAX: 204-261-5732 |