Over the past year, I've been looking at methods for building
distributed directories of biology data objects (from gene
sequences to genome and proteome features to biomedical
literature, phylogenetic and ontological classifications, and
on). Directories are an important aspect for bio-grid computing,
such that one can computationally search, retrieve, replicate and
use more effectively the wealth of bio-data we now have.
Features of object directories need to be useable by many
bioinformatics projects and centers, and be robust for
the growing volumes and changing nature of biology data. Such
-- build on existing, practical technology for finding and organizing
-- efficient and quick at handling millions of bio objects, by the
gigabyte and terabyte, that we want to use.
-- provide for queries that are distributed across directories
of collaborating services
-- support existing and new data access mechanisms used in bioinformatics,
including relational databases, object and XML databases, bioinformatics
specific methods such as SRS, Entrez, AceDB.
-- provide simple methods for programmable access to directories, in
a range of programming languages
-- use flexible, common schema for describing objects in directories
-- able to replicate directories and data objects among bioinformatics
-- able to build peer-to-peer data access systems for collaborative
-- include current authentication and security technology for
appropriate data access
Recent work has focused on comparing Lightweight directory access (LDAP) and
XML-WebServices (SOAP, WSDL, UDDI and others). I'm using SRS as a backend
data access system currently, since it provides good access to
millions of objects in 100s of gigabytes among hundreds of bio-databanks.
Find more of this work at
If you are interested in hooking any kind of bio-data access system -
relational databases, SRS, etc. to such distributed directory methods,
drop me a line.
For those of you interested in joining our team in developing
this next generation of bio-data access and bio-grid
computing, we have bioinformatics postdoctoral positions
-- Don Gilbert
-- gilbertd at bio.indiana.edu