SeqApp does restriction site mapping in this way:
a) read in table of restriction enzymes w/ their sites
(see rich robert's rebase in various formats)
b) pull sequence from file somewhere
c) for each r. enzyme, find its cut points on the sequence.
This is a basic pattern matching common to much of
biosequence analysis software.
The particular algorithm I use is derived from general
string matching in software (like in text editing, and
many other software programs). The nucleic pattern of
the r.enzyme cut site is slid along the sequence and
each matching point is recorded. Some finess is needed
to deal with ambiguous bases, reverse complements, etc.
Hash tables are a way to do this more quickly if need be,
at added complexity. See, for instance, FastA source*
by Pearson and Lipman for hash table use in matching
a sequence to a library of sequences. This complexity is
probably not needed for r.e. mapping.
The basics here and in a lot of gene sequence analysis
are that of sliding one sequence of letters against
another and recording matchings.
* one distribution site is anonymous ftp to ftp.bio.indiana.edu,
as /molbio/search/fasta16c2.shar.Z You may want to poke around
this archive looking for other program source examples. This
archive started as my personal library of examples to
help me learn to write biocomputing software.
Don Gilbert gilbert at bio.indiana.edu
biocomputing office, biology dept., indiana univ., bloomington, in 47405