harper at nic.funet.fi (Rob Harper) wrote:
>> I thought about doing it that way and going for xembl.seq and
> xembl.ref, but I think that people would want to retreive the
> sequences and the references together.
>> I think that think.com did GB_BA: as a source but it only had
> the references in it. When you have the sequences there as well
> every segment of ten bases gets treated as a new word... and gets
> entered in the dct and the idx. This means that the indexing takes
> a L O N G time. Anyway as an amusing sideline you can enter
> sequences in as keywords in blocks of 10 and do "homology"
> searches of your sequence against the xembl update entries.
> Dr. Pearson would roll his eyes in horror!!!
I think Genbank & EMBL sequences via WAIS, with keyword/author/accession/
species searching, makes a great deal of sense to the biology community.
WAIS software seems to be well designed for the general task of indexing &
serving information, and the client programs are available for a
variety of platforms, and there is a ready advertising/distribution
scheme in the directory-of-servers. It is pretty easy for non-programmers
to set up a WAIS server of any general text databank.
All that it would take to get EMBL & GenBank working well thru WAIS,
without an extra 300% disk space, is to modify the waisindex program
to teach it about the sequence file formats and how to index efficiently
without adding in all the sequence data.
I'm not volunteering for that task, but it looks like a fairly simple
Don Gilbert gilbert at bio.indiana.edu
biocomputing office, biology dept., indiana univ., bloomington, in 47405