From gilbertd from net.bio.net Fri Jul 16 12:13:41 2010 From: gilbertd from net.bio.net (Don Gilbert) Date: Fri Jul 16 12:14:12 2010 Subject: [Arthropod] Protocol to improve draft bug genomes with Illumina reads? Message-ID: <201007161713.o6GHDfe23750@net.bio.net> Dear all, Does anyone have suggestions or experience with improving draft genome assemblies with new short read dna? We are thinking about this for the Daphnia pulex assembly, now 4 years old, with a pile of new 40x Illimina paired end genomic data for several related populations. Some approaches I'm aware of are gap closing, e.g. http://genomebiology.com/2010/11/4/R41 which looks useful and straightforward, but may not solve mistakes in the original assembly. Or a complete new assembly say with Celera assembler (mixing old Sanger data + new Illumina), which would be much more work, and give us an assembly that the old gene data cannot be easily mapped to. But it might be much improved if the old one has more mistakes than we'd like. -- Don From gilbertd from net.bio.net Thu Jul 22 11:49:51 2010 From: gilbertd from net.bio.net (Don Gilbert) Date: Thu Jul 22 11:51:36 2010 Subject: [Arthropod] Re: Protocol to improve draft bug genomes with Illumina reads? Message-ID: <201007221649.o6MGnpV08157@net.bio.net> Dear all, I did get a couple of replies to my question: Kim Worley at Baylor College of Med. genome center says "We've worked with combining data in assemblies. Different data and different assemblers or versions of assemblers work better in some combinations and worse in others." They also have an unpublished tool to scaffold and gap fill with this data. Scott Emrich has experience in this with insect/arthropod genomes n collaboration with the Celera assembler group at U Maryland, and offered to help us apply this with the Daphnia genome. There is also this recent summary paper on uses/problems for genome assembly with second/next generation data. Assembly of large genomes using second-generation sequencing Michael C Schatz, Arthur L Delcher and Steven L. Salzberg Genome Res., May 2010 doi:10.1101/gr.101360.109 -- Don