Searching GenBank for nucleotide repeats

Keith Robinson keith at bones.biochem.ualberta.ca
Tue Mar 1 16:52:54 EST 1994

A staff member here wants to search all of the rodent cDNA sequences in
GenBank to get an idea of the frequency of occurence of:

 - Single base repeats (e.g. GGGGGG) of lengths 6 to 20 inclusive 
    for all 4 possible combinations

 - double-base repeats (e.g. GAGAGAGAGAGA) of length 6 to 20, for
    all 16 possible combinations

 - triple-base repeats (e.g. GATGATGATGATGATGAT) of length 6 to 20,
    for all 64 possible combinations

We use GCG here, and it is possible to perform this search with GCG's
"findpatterns" command (e.g. searching for G repeats can be done with
the pattern ~GG{6,20}~G), but this is time consuming (human and computer),
and processing the resulting output file is rather tedious. Before 
attempting to write our own program, does anyone know of any software 
which would make setting up and interpreting results of these searches 

 Keith Robinson             Dept. of Biochemistry
 The University of Alberta  Edmonton, Alberta Canada
 "The information highway is like teenagers and sex -
  all talk, but no action."             overheard

