In article <msalminen-0610951433170001 at hivgenome.hjf.org>,
Mika Salminen <msalminen at hiv.hjf.org> wrote:
>We are thinking of porting a large collection of genbank formatted viral
>sequences into an ACEDB based database. We are concerned with trying to
>retain as much of the information of the fields in the genbank entries as
>possible, and specifically wonder if someone outhere has tools developed
>for this purpose already (filters I mean).
>>We would especially appreciate advice ofrom someone who might have done
>this with another virus, since that would propably be really helpful for
>the deign of the database model.
>>>Thanks in avance for your replies,
I think you will find that any feature used in GenBank/EMBL is
available in ACEDB. There are a variety of converters available.
Below is a short list as a starting point.
All of these converters might be specific to a particular database
project. That is they will take what they want from GenBank and
discard the rest.
----- My converter -----
Anonymous ftp from weeds.mgh.harvard.edu look in the
There are a few others in that directory too.
----- A recently announced Perl script -----
I don't know how much people want something like this, but...
I've written a Genbank to .ace conversion script in perl. It handles
the latest version of Genbank flat format files and outputs into ACEDB
version 4 type ?Sequence and ?Paper models. (The models we use have been
slightly changed from the distribution - a copy is enclosed.) Most of the
Genbank features are handled, and the user can set an option to generically
handle the features for which I did not write specific code. The feature
handling routines are in a separate file, so it's easy to extend them.
The tar file is available at:
Enjoy. Also, please send me your comments (or requests, even).
/s/ Martin Ferguson
mferguso at klab.agsci.colostate.edu
P.S. If anyone knows of some perl libraries to handle asn.1 data, please
let me know.
----- GenBank ASN.1 to ACE -----
we've written such a beast also, but for the binary representation (ASN.1) as
well as for text files. It is written in C, uses the NCBI toolkit (see
http://www.ncbi.nlm.nih.gov/), and is able to decode Entrez Data with the
help of an additional program. It can be found at