NRSUB: A NON-REDUNDANT DATA BASE FOR THE
BACILLUS SUBTILIS GENOME
I- Presentation
NRSub (which means "Non Redundant Subtilis") is a database
containing a "clean" set of Bacillus subtilis sequences taken
from the SubtiList collection. By "clean" we mean that all of
its nucleotide sequences are cleared of duplications. Additional
data on gene mapping and codon usage are also introduced, as
cross-references with EMBL, Swiss-Prot and Enzyme collections.
NRSub release 4 contains a total of 248 contigs (61 composite).
All these sequences are chromosomal (plasmidic sequences are
removed) and totalize 1,251,557 bp. This represents approxima-
tively 30% of the entire Bacillus subtilis chromosome consisting
of about 4,165 kbp. These sequences contain 1053 CDS (358 ORF),
72 tRNA and 27 rRNA. At last, a total of 423 bibliographic
references can be accessed.
II- System requirement
NRSub is provided either in EMBL flat file format or structured
under the ACNUC data base model. Of course, the flat file can be
used with any kind of computer. On the other hand, the ACNUC
version need the retrieval program query. We provide executables
of query for the following architectures: Sun Sparc (under SunOS
4.1.x or Solaris 2.x), IBM RISC, SGI, and DEC Alpha. Sources of
the line-mode version of query (in Fortran and C) are also
included in the distribution. This line-mode version may be
compiled and ran on almost any UNIX system (BSD or SysV). To do
so, you need to have a Fortran compiler and a C compiler instal-
led on your computer.
Detailed instructions for set-up and use are given in the INSTALL
file of the package.
III- Distribution
The release 4 of NRSub is available at the NIG anonymous FTP
(ftp.nig.ac.jp or 133.39.3.6) in the directory /pub/db/nrsub.
It is also possible to access NRSub through a WWW server at URL:
http://ddbjs4h.genes.nig.ac.jp/
The distribution includes:
- The NRSub data base under ACNUC and the sources of the query
program for line mode use (file NRSub.r4.tar.Z).
- The flat version of the NRSub data base in EMBL format (file
NRSub.dat or NRSub.dat.Z).
- The binaries of the graphical version of the query program
(files query_win.*.Z). Query_win.SUN file is for Sun Sparc
under SunOS, query_win.SOL is for Sun Sparc under Solaris,
query_win.RS6000 is for IBM RISC, query_win.ALPHA is for
DEC Alpha, and query_win.SGI is for Silicon Graphics.
All the *.Z files are compressed using the UNIX command
'compress'. In a way to uncompress them, you must use
'uncompress'. The flat file version of NRSub is distributed
either compressed (NRSub.dat.Z) or in plain text (NRSub,dat) for
the people not working on a UNIX machine.
If you have some problems or questions, feel free to ask me at
the following address:
Guy Perriere
National Institute of Genetics
Shizuoka-ken 411, Mishima
Japan
Email: gperrier at ddbj.nig.ac.jp