In <1992Feb8.195018.9902 at urz.unibas.ch> doelz at urz.unibas.ch writes:
*>Thanks for the effort, Rob. May I suggest the following:
*>If you separate the references from the sequence, it would be easier
I thought about doing it that way and going for xembl.seq and
xembl.ref, but I think that people would want to retreive the
sequences and the references together.
I think that think.com did GB_BA: as a source but it only had
the references in it. When you have the sequences there as well
every segment of ten bases gets treated as a new word... and gets
entered in the dct and the idx. This means that the indexing takes
a L O N G time. Anyway as an amusing sideline you can enter
sequences in as keywords in blocks of 10 and do "homology"
searches of your sequence against the xembl update entries.
Dr. Pearson would roll his eyes in horror!!!
*>What about just making an index
*>for this and let people use ftp's or file servers for sequece retrieval?
The present xembl.src will allow you to search for a sequence
and retreive it.
*>On the other hand I think that it is very nice to have such a service
*>available, so, provided that you'd still got enough disk space, keep on .
Well the mechanics of the thing have to be worked out.
I have been thinking of transfering the xembl.flat file
on a weekly basis and reindexing it after it arrives.
Then there are lots of little fiddley bits to edit so that
the source can be accessed over the network...
I really need to hear from people that xembl.src is useful
before I invest time and disk space on it. Comments please
to the newsgroup or the address below.
Rob Harper / E-mail: harper at convex.csc.fi
Finnish State Computer Centre / Molbio/software: harper at nic.funet.fi
P.O. Box 40, SF-02101 Espoo / Telephone: +358 0 457 2076
Finland / Fax: +358 0 457 2302