Don Gilbert writes:
>I think Genbank & EMBL sequences via WAIS, with keyword/author/accession/
>species searching, makes a great deal of sense to the biology community.
>WAIS software seems to be well designed for the general task of indexing &
>serving information, and the client programs are available for a
>variety of platforms, and there is a ready advertising/distribution
>scheme in the directory-of-servers. It is pretty easy for non-programmers
>to set up a WAIS server of any general text databank.
>>All that it would take to get EMBL & GenBank working well thru WAIS,
>without an extra 300% disk space, is to modify the waisindex program
>to teach it about the sequence file formats and how to index efficiently
>without adding in all the sequence data.
>>I'm not volunteering for that task, but it looks like a fairly simple
You're right, and I have done it in the past while evaluating the usefulnes
of WAIS as a database retrieval system. I'm happy to provide the sources to
Rob Harper if he wants to maintain a WAIS source for new GB/EMBl sequences.
My modifications exclude the sequences from indexing but they are of course
still in the entries when they are downloaded.
After playing with WAIS for a while I was not convinced that it is the
optimal solution for the particular problem of finding an entry in a sequence
database. Seems to be sort of overkill because in most cases you would like
to limit your search to certain fields such as Authors, Keywords, Species,
Accession#, etc. This is exactly what you *cannot* do with WAIS (actually,
this sort of query is supported by the underlying Z39.50 protocol but not by
the WAIS implementation of this protocol). WAIS seems to be a very good tool
for searching *unstructured* databases, but there must be something better
for *structured* databases.
It is unlikely that EMBL will offer a service based on WAIS in the
foreseeable future but if Rob is willing to do so, I'm more than happy to
send him my modifications.
EMBL Data Library
Fuchs at EMBL-Heidelberg.DE