The most up to date verions on the sequences on our ftp site are in the
FINISHED and UNFINISHED subdirectories of
ftp://ftp.sanger.ac.uk/pub/databases/C.elegans_sequences/. These are
updated nightly. These are consensus sequences from individual sequencing
projects, which are typically cosmids or (fragments of) YACs.
The Sanger site has copies of all the St Louis consensus sequences as well
as Sanger Centre sequences.
The file "allcmid" in the directory given above contains all the finished
and unfinished data from both sites in a single fasta format file. There is
quite a lot of redundancy, mainly coming from unfinished sequence - it
contains around 135Mb currently.
The directory CHROMOSOMES contains a set of 6 assembled sequences for the 6
chromosomes that we put together for the Science paper. This contains all
the finished sequence at the time plus large amounts of essentially finished
material from unfinished projects. The total of ACGT bases is around
95.5Mb. There are gaps of arbitrary size, indicated with '-' characters
rather than 'N's, totalling around 1.75Mb. There were 4 errors in stitching
together that led to short stretches of nonsense sequence in the overlap
region (a few thousand bases total).
The current ACEDB version contains the same sequences as used for the
Science paper. It is several months out of date, but contains more than
would be available now as finished, because of the essentially finished
sections from unfinished projects that were dumped out and automatically
analysed for the Science paper.