IUBio GIL .. BIOSCI/Bionet News .. Biosequences .. Software .. FTP

Indexing EMBL 51

etzold etzold at ebi.ac.uk
Thu Jul 10 08:54:25 EST 1997

Andrew T. Lloyd wrote:

> Indexing EMBL 51 took 32 HOURS.  During this time, my
> users get disturbingly wrong info, presmably because the
> partly built indexes point to the old parts of the flatfiles ?

i can't say why it took so long to build the indices. The memory
is much much lower as with srs4. Since index building is io intensive
make sure that the computer creates files on a local disk and
does not use nfs.

> Q1.  Is it possible to dump the new indexes to a temporary
> directory, maintaining full functionality with the old DBs
> and when the indexing is finished switch it all over in say
> 32 seconds ?

we did this at the ebi for the last release of embl.
there are two problems:
1) for the duration of the indexing you need to keep two copies
   of the embl flatfiles ...but this extra amount of disk space is
   worth paying for!
2) this procedure must be done by hand - srscheck is not clever 
   enough for this kind of thing.

> At vast (and I mean VAST) expense, I bought another 256MB
> memory last year.
> Q2. Is it possible to rejig the indexing protocol
> so that it uses more than, say, 90MB and thus goes through
> faster ?

what you could try is to increase .cachesize (embl.i) but
it is already quite high. There is no point increasing the 
.partitionSize of $library. since that makes only merging
of the indices (not much) faster.
> And finally on the real noddy level:
> Q3.  Once I've clagged all my SW:reca_* into one list, how do
> I extract them as, say, a concatenated fasta file which I can
> zing through clustalW.


getz '[swissprot-id:reca*]' -f seq -sf 


More information about the Bio-srs mailing list

Send comments to us at archive@iubioarchive.bio.net