These bio-data directory construction tools are based on LDAP for a computable,
networked search/retrieval of bio-data objects (as opposed to flat file databanks
or web pages). This test sample includes an experimental LDAP-SRS gateway
for query and retrieval of bulk biodata via the Lightweight Directory Access
Protocol (LDAP).
LDAP has several features favorable for data-grid computing including:
efficiency with large numbers and volumes of objects, public standard query
syntax that is based on shared schema, and mature software protocols and systems
that include security, binary encoded transport, flexibility in employing
multiple backend data servers (RDMBS, Berkeley-DB, Perl and Shell tools),
where new ones (like the bioinformatics data search standard SRS) can be
added with relative ease.
Directory Services
The SRS-LDAP service is available at ldap://iubio.bio.indiana.edu:3895/srv=srs
(via OpenLDAP) or ldap://iubio.bio.indiana.edu:10389/srv=srs (via JavaLDAP)
Many if not most unix computers today include an ldapsearch tool; these calls
will list these base directories:
ldapsearch -x -H ldap://iubio.bio.indiana.edu:3895/ -b 'srv=srs' -s
base
ldapsearch -x -H ldap://iubio.bio.indiana.edu:10389/ -b 'srv=srs'
-s base
These interface with the current SRS server data set at
http://iubio.bio.indiana.edu/srs/
Experimental Web
access to Bio-directories via LDAP and WebServices, Oct. 2002 Software
behind this is all collected in iubio-srs.tar.gz (Note: this may be offline
at times).
Gridlet client for directory access from
distributed compute nodes. Use on each grid node to select, retrieve data subsets
from directory servers.
Talks
The
srsldap-speed3.pdf is a chart of efficiencies of various methods (LDAP,
SOAP, Wgetz, FTP) for biodata directory search and retrieval (older chart
in srsldap-speed2.pdf).
srs-gnoinf-dirs-talk, MS Powerpoint or Portable Doc, is a slide show presented
at ISMB 2002 (Aug. 2002, Edmonton AB) for the SRS User group meeting. It touches
on use of SRS with genome information systems like FlyBase and euGenes, and
for grid data directory systems.
iubio-srs.tar.gz is an archive
of this new software to test for linking SRS with OpenLDAP server as a back-end
search system for bio-data. iubio-srs.readme
describes its contents. Also included are tests with Web-XML (SOAP)
(Oct. 2002).
The simple client programs, ldapsearch.java
and ldapsearch.pl
(perl), should be usable from most computer systems, and are straight-forward
programming examples of using LDAP to search and retrieve data. They include
test examples for IUBio Archive's LDAP bio-data services (which may change).
Tests have been focused mainly on looking at efficiency of this for use
in distributed (grid) computing. The general idea being for useful grid computing
w/ biodata, one needs to be able to select and move quantities in the order
of 100K - 1000K records, 1GB - 10 GB in size, to many computers quickly
enough to make it better than running the computation on one central server.
Though work remains to make this widely usable, those who are interested in
testing SRS over LDAP can use it as a starting point.
The software here is experimental, a work in progress (July 2002).
For further information, please contact Don Gilbert (gilbertd@bio.indiana.edu)