The public databases maintained by NCBI undergo changes in terms of
structure and data. The dbSNP database for example, exposes its data
and schema as files shared on an FTP site. Everytime there is a major
change, a build is generated and published on the FTP site. If we have
to maintain a local copy of the database, we need to manually monitor
NCBI for new builds and recreate the entire database from the files
shared in the new build.
The problem of having to deal with schema changes is likely to have
been faced by anyone who has maintained a local copy of this database.
The solution would be to create the schema afresh using the scripts
form the new build and populate it with the new data and then use the
new database as the local dbSNP database. We are looking for a
programmatic technique to automatically identify structural changes so
that only the changes can be applied to the local database to obtain
the new schema, without having to discard the existing database and
create a new one. Also, is it possible to identify the data changes
between builds? It will be an added advantage if we can go a step
further and detect the release of new builds.
It will be helpful if you can share any ideas/thoughts/code/design etc.