I am interested in testing several ideas about organization of genomic
Could you please send me references about:
1. Sequences management in relational databases.
Databases, I know, store data in tables and rows, but sequences seems to
be stored in flat files (i.e. in FASTA format). Is it good idea to chop
the sequences and transfer them into relational database? Some kinds of
sequences are well suited for storage in relational database (i.e.
protein and cDNA sequences), but genomic sequences are not. Is it good
idea to cut genomic sequences into fragments containing ORFs with
theirs upstream and downstream sequence, and with some positioning
information (i.e.. IDs of upstream and downstream ORFs). With each ORF
in the database it is possible to store additional information (computed
or taken from known literature) like:
-IDs of known aa motives,
-ID of known conserved structural domains,
-ID of interacting proteins,
-pre computed information about structural, sequence, and functional
homologies (similar to "neighbors" in NCBI databases),
-all other information (especially raw experimental data),
2. There are lots of tools for sequence analysis written in perl, c,
How the interface between the database and the tools should be designed?
Are there any examples?