A number of places/companies/institutes creates and maintains an uptodate
mirror of dbEST and dbGSS and other public resources. The sequences are then
fed through a cleaning pipeline which then gets processed and mined for
information, unfortunately that resource is not often made available to the
public. I have such a database if anyone finds it would be useful to them and
of course its free:
http://www.estinformatics.org/
You can select by organism eg: Homo sapiens:
http://www.estinformatics.org/cgi-bin/dbEST_clean.pl?search=sapiens
and can download the compressed files (>1GB make sure you have fast connection)
The defline contains the gb accession followed by the library id and the
trimming coordinates if trimming has occured, details can be found here:
http://estinformatics.org/subpage.html
It gets updated every night with a public release every weekend done via a semi
automatically process. Each weekend I also provide the number of new sequences
and statistics that were added during the previous week/5-7 days:
http://estinformatics.org/gpage9.html
I hope it will be an useful tool to the community, please leave comments on the
webpage if you find it useful. I would be happy and am looking/trying to
collaborate/partner with other companines/institutes. This is an independent
personally funded project and is not associated with any other
institute/company.
Best Wishes