Greetings GenBank Users,
GenBank Release 121.0 is now available via ftp from the National Center
for Biotechnology Information:
Ftp Site Directory Contents
---------------- --------- ---------------------------------------
ncbi.nlm.nih.gov genbank GenBank Release 121.0 flatfiles
ncbi-asn1 ASN.1 data used to create Release 121.0
Uncompressed, the Release 121.0 flatfiles require roughly 39960 MB
(sequence files only) or 44492 MB (including the 'index' files). The
ASN.1 version requires roughly 35672 MB. From the release notes:
Release Date Base Pairs Entries
120 Oct 2000 10335692655 9102634
121 Dec 2000 11101066288 10106023
Close-of-data was 12/20/2000. Four days were required to prepare this
release. In the nine-week period between close-of-data for GenBank 120.0
and GenBank 121.0, GenBank grew by 0.765 billion basepairs and 1,003,389
sequence records, to exceed the 11 Gbp and 10 million record thresholds.
For additional release information, see the README files in either of the
directories mentioned above, and the release notes (gbrel.txt) in the
genbank directory. Sections 1.3 and 1.4 of the release notes (Changes in
Release 121.0 and Upcoming Changes) have been appended below.
Release 121.0 data are currently available via NCBI's Entrez and Blast
servers, and the 'query' email server.
New GenBank cumulative update files (gbcu.flat.Z and gbcu.aso.Z), containing
only those entries new/updated since the Release 121.0 close-of-data, should be
available by 10:00am EST, December 25. Please note that the new CUs will be
smaller than previous versions you might have obtained after Release 120.0 was
posted.
If you encounter problems while ftp'ing or uncompressing Release 121.0,
please send email outlining your difficulties to info at ncbi.nlm.nih.gov .
Mark Cavanaugh, Vladimir Alekseyev, Anton Butanaev
GenBank
NCBI/NLM/NIH
1.3 Important Changes in Release 121.0
1.3.1 Organizational changes
Due to database growth, the EST division is now being split into ninety-eight
pieces.
Due to database growth, the GSS division is now being split into thirty-two
pieces.
Due to database growth, the PRI division is now being split into nine pieces.
Due to database growth, the AUT index is now being split into eight pieces.
1.3.2 Alternative GenBank FTP site
A mirror of the GenBank FTP site at the NCBI is now available from the San Diego
Supercomputer Center:
ftp://genbank.sdsc.edu/pub
Some users who experience slow FTP transfers of large files (entire releases, the
GenBank Cumulative Update, etc) might find an improvement in transfer rates from
this alternate site when traffic at the NCBI is high.
1.3.3 ROD division now split into two parts
Since the gbrod.seq datafile exceeded 250MB in size for this release, it is now
being split into two parts : gbrod1.seq and gbrod2.seq . This wasn't noticed in
time for inclusion in the Upcoming Changes for GenBank 120.0; our apologies for this
unannounced organizational change.
1.4 Upcoming Changes
1.4.1 New HTC division to be introduced
A new GenBank division for unfinished high-throughput cDNA sequencing (HTC)
will be included in GenBank releases in early 2001. HTC sequences may have 5'UTR
and 3'UTR at their ends, partial coding regions, and introns. A keyword of
"HTC" will be present, in addition to division code "HTC". After finishing,
an HTC sequence will move to the appropriate taxonomic GenBank division and
the "HTC" keyword will be removed. Further details about the nature of HTC
sequencing projects and the scope of the new division will soon be made available
via these release notes and the GenBank newsgroup. The HTC division will not
be introduced before April of 2001.
1.4.2 Minor change to REFERENCE line
The REFERENCE keyword for the literature citations associated with a GenBank
record currently requires a parenthetical component indicating either the
basepair span to which the citation applies, or "sites" for citations providing
annotation rather than sequence data. Here are some examples:
REFERENCE 1 (bases 1 to 262290)
REFERENCE 2 (sites)
REFERENCE 3 (bases 1 to 456; bases 700 to 2334)
In some cases, sequence updates provided by submittors can involve a large
number of changes. And sometimes, a submittor does not wish to indicate
exactly _which_ basepair spans are involved. As a result, we will change the
definition of the REFERENCE line to make the parenthetical component an
optional element as of GenBank Release 123.0 (April 2001).
1.4.3 NCBI's ftp address will be changed
At some point in the near future NCBI's ftp address will be changed.
The current address:
ncbi.nlm.nih.gov
will become:
ftp.ncbi.nih.gov
Additional details about this change will be made available via these
release notes and the GenBank newsgroup (bionet.molbio.genbank) as they
become available.
1.4.4 Selenocysteine representation
Selenocysteine residues within the protein translations of coding
region features have been represented in GenBank via the letter 'X'
and a /transl_except qualifier. At the May 1999 DDBJ/EMBL/GenBank
collaborative meeting, it was learned that IUPAC plans to adopt the
letter 'U' for selenocysteine.
DDBJ, EMBL, and GenBank will thus use this new amino acid abbreviation
for its /translation qualifiers. Although a timetable for its appearance
has not been finalized, we are mentioning this now because the introduction
of a new residue abbreviation is a fairly fundamental change.
Details about the use of 'U' will be made available via these release
notes and the GenBank newsgroup as they become available.
1.4.5 New REFERENCE type for on-line journals
Agreement was reached at the May 1999 collaborative DDBJ/EMBL/GenBank
meeting that an effort should be made to accomodate references which are
published only on-line. Until specifications for such references are
available from library organizations, GenBank will present them in a manner
like this:
REFERENCE 1 (bases 1 to 2858)
AUTHORS Smith, J.
TITLE Cloning and expression of a phospholipase gene
JOURNAL Online Publication
REMARK Online-Journal-name; Article Identifier; URL
This format is still tentative; additional information about this new
reference type will be made available via these release notes.
---
- gttaacaattaaagagtgtttatcgaaattcattatatagtggtttatatagaccacttc
-
- GenBank newsgroup see: http://www.bio.net/hypermail/genbankb/
- GENBANKB e-mail: messages sent to genbankb at net.bio.net
- subscribe: e-mail biosci-server at net.bio.net with: subscribe genbankb
- unsub: e-mail biosci-server at net.bio.net with: unsubscribe genbankb
- GenBank on the WWW, see: http://www.ncbi.nlm.nih.gov/Genbank/
- problems with GENBANKB? E-mail moderator: francis at cmmt.ubc.ca