Will Fischer (wfischer at bio.indiana.edu) wrote:
> I need to extract given pieces of sequence from a set of EMBL/GenBank
> entries, using ranges defined in the features table. For example, I'd
> like to be able to extract, from a set of entries, the DNA sequence for
> each exon, or again for every complete CDS feature (all exons
> Surely not everyone does this manually?
> What software exists that can actually parse the (eminently parsible)
> joint features table format? Please post reviews of programs you have
> used, or mail me directly and I will summarize.
> -- Will Fischer
I have a whole series of routines written in Pascal, currently running
under ULTRIX, to provide this function. The routines are used in an
"intelligent" extraction program which can splice regions from entries
in accordance with feature table entries. They have further been used
in the analysis of termination codes and contributed to:
"The Signal for the Termination of Protein Synthesis in
Procaryotes", C.M. Brown, P.A. Stockwell, C.N.A. Trotman and
W.P. Tate. Nucleic Acids Research, 1990, 18, 2079-2085.
"Sequence analysis suggests that tetra-nucleotides signal the
termination of protein synthesis in eukaryotes", C.M. Brown,
P.A. Stockwell, C.N.A. Trotman and W.P. Tate. Nucleic Acids
Research, 1990, 18, 6339-6345.
"The Translational Termination Signal Database", C.M. Brown, M.E.
Dalphin, P.A. Stockwell and W.P. Tate. Nucleic Acids Research,
1993, Supplement, In press.
The routines are capable of handling present forms of feature table
entry and are even capable of splicing chloroplast genomic sequences
where some exons have been given in the opposite sense from their
I am happy to make this code available as required.
Peter A. Stockwell
Dept of Biochemistry,
University of Otago,
Dunedin, New Zealand.