Dave Matthews matthews at greengenes.cit.cornell.edu
Tue Apr 18 04:32:40 EST 2000

Apologies for forwarding a raw dialog, but there are several interesting
issues scattered in it.  ZmDB is at www.zmdb.iastate.edu/.
Date: Mon, 17 Apr 2000 14:37:51 -0500
From: xgai <xgai at iastate.edu>
Subject: Re: slow loading ACEDB

I checked the memory usage and xace uses only 8% of the memory. Our server 
has 0.5 GB memory. As you noticed, the Sequence model that I built is a 
very simple model, it is much less complicated than the model used by 
ACeDB. I actually wondered how people who maintain the ACeDB did to load 
the large amount of data. Their database is much larger and much more 

I tried to divide the big .ace file into several smaller files. I tried to 
load the database by phases and quit, save the data and reopen it in 
between. It does not solve the problem. The total number of the sequence 
objects seems to be the magical number. I realized that it should take 
longer and longer to update the index file when the index got much larger. 
However, it is the precipitous drop of performance that puzzles me.

In fact, this time the sequences that I am trying to load do not have "EST" 
as the keyword (I found it is not that useful anyway since all my sequences 
are ESTs).

One more question: Do you know any good ACEDB documents that are more like 
an ACEDB database manager's guide? It has been very frustrating for me to 
try to figure a lot of things out and I can not find such a document for 
reference on all of the acedb links or web sites that I can find.

I am really thankful that you spent time answering my questions. I greatly 
appreciate it.

>Interesting question.  Looks like a good one for the ACEDB newsgroup.
>Lots of folks are building bigger databases nowadays, and no doubt starting
>to hit performance problems.  I have two thoughts/suggestions:
>1. machine
>The symptoms sound like you're running out of RAM and starting to use the
>swap space (virtual memory).  This would always slow things down hugely.
>Even worse if your swap space is on the same disk with database/block*wrm
>and/or the .ace file.  You can monitor memory usage while the database is
>loading with "ps" and "vmstat".  I like "top" even better; it's usually
>included with Linux.
>I noticed this with GrainGenes when the number of Sequence records went
>past about 30K.  On my old Sparc2 with 64 MB RAM it now takes over two hours
>to load the whole database.  On a newer machine with 1 GB RAM (Intel Solaris)
>it's 4 minutes.
>One thing ACEDB doesn't handle very well is records that have huge numbers
>of links from them, tens of thousands.  I just looked at the Sequence model
>for the online ZmDB, and it looks like you've cleaned it of XREFs that
>might cause this kind of trouble.  But I notice many of your Sequences have
>Keyword "EST".  The ?Keyword class is usually automatically XREF'd; I don't
>know how to prevent it.  Also, trying to query or browse ZmDB for Keyword
>EST, or 'find Sequence keyword=est', all fail.
>Did you look in database/log.wrm for complaints?  As you may know, this one
>is common:
>2000-04-14_10:20:10 genome 16041        Class Text, object 93.09 has 
>8729 > 3000 cells.
>  This is just a warning, acedb has no hard limits on the mumber of cells
>  per object, but the performances degrade on very large objects
>  Either, you are cross referencing many entries into a single object,
>  it may not be useful, and you could drop the XREF in the model and get the
>  same info via an occasional query or, continually, via a subclass,
>  or This object is Class:?Text, and you should rather use plain Text in 
> the model
>  or define a controlled vocabulary by giving an explicit list of tags
>  This message is issued once per offending class and per code run
>I'd be interested to hear if you identify any smoking guns.
> > I need your help. Our database, ZmDB, is an ACEDB database. I found an
> > interesting problem with ACEDB that confuses and frustrates me. When I used
> > the Ace parser to load a large .ace file, it first was flying (relatively)
> > and parsed thousands of records in a few minutes. However, after a while ,
> > it seemed reach to a critical point where it becomes terribly slow (about
> > one record per minute). I am very frustrated and puzzled. I can not find
> > anywhere in ACEDB documents that says anything about it? Can you shed some
> > lights on it? Thank you.
> >
> > Some details: we are running ACEDB on a linux box (RedHat5.2). The number
> > of records that I tried to load is about 50,000 of the same type sequence
> > data.

Xiaowu Gai
2104 Molecular Biology Building
Department of Zoology & Genetics
Iowa State University
Ames, IA 50010
Tel: (515)-2940022

