From gilbertd from net.bio.net Sun Dec 6 16:52:06 2009 From: gilbertd from net.bio.net (Don Gilbert) Date: Sun Dec 6 16:52:55 2009 Subject: [Arthropod] Arthropod EST gene assemblies at insects.eugenes.org/arthropods/ Message-ID: <200912062152.nB6Lq6g05202@net.bio.net> Dear scientists, Find a comparison of six Arthropod species EST gene assemblies here http://insects.eugenes.org/arthropods/summaries/PASA-EST-assemblies.html EST assembly data are provided, along with gene model udpates produced with PASA EST assembly software. EST mapping errors and duplicate locations are analyzed with respect to assembly unit size (scaffolds), showing an error increase on smaller scaffolds. The species include arachnid tick Ixodes, the crustacean waterflea Daphnia, and 4 insects: Bombyx silk moth, Drosophila fruitfly, Nasonia wasp and Acyrthosiphon aphid. You can find some extreme results among these: - Which of these have official gene models that missed over 50% of EST assemblies? - Which of these have 3 times more ESTs split between scaffolds, and a perhaps corresponding missed coding exon per gene model, on average? You can get answers to these questions on the bionet.Arthropod news group, just ask. -- Don Gilbert -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405 -- gilbertd@indiana.edu--http://marmot.bio.indiana.edu/ From yannick.wurm from unil.ch Mon Dec 7 01:25:32 2009 From: yannick.wurm from unil.ch (Yannick Wurm) Date: Mon Dec 7 09:29:26 2009 Subject: [Arthropod] EST assembly correction Message-ID: Hello all, thanks Don for setting up this list. I'm just finishing my PhD in Lausanne, Switzerland, working with some EST data from ants (the closest high quality genomes are the Nasonia wasp and the Honeybee). One issue with 454 data are homopolymer errors (AAAAAAA may become AAAAAA or AAAAAAAA according to the 454 basecaller). When in a coding sequence, something like that leads to frameshifts and thus bad protein models. It should be possible to correct for this kind of error (and insertions/deletions in EST data in general) by using alignments obtained from blastx against a database of good proteins. Have any list members done this? Kind regards, Yannick -------------------------------------------- yannick . wurm @ unil . ch Ant Genomics, Ecology & Evolution @ Lausanne http://www.unil.ch/dee/page28685_fr.html From mark.blaxter from ed.ac.uk Mon Dec 7 05:18:00 2009 From: mark.blaxter from ed.ac.uk (Mark Blaxter) Date: Mon Dec 7 09:29:27 2009 Subject: [Arthropod] EST assemblies Message-ID: Hi all we did several assemblies of arthropod ESTs a couple of years ago and posted them on the www at http://www.nematodes.org/NeglectedGenomes/ARTHROPODA/ There are only 65 or so taxa in those databases, but we are adding 200 more at present... If youd like local copies of the db, they're fereely available. Mark Blaxter mark.blaxter@ed.ac.uk ~ may all beings be happy ~ -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From mark.blaxter from ed.ac.uk Mon Dec 7 11:07:03 2009 From: mark.blaxter from ed.ac.uk (Mark Blaxter) Date: Mon Dec 7 12:37:19 2009 Subject: [Arthropod] EST assembly correction In-Reply-To: References: Message-ID: <41097702-9A5E-4716-9516-4A5BF1C14B71@ed.ac.uk> Hi Using a 'good' est translator such as prot4est will autocorrect for these errors, and will also autocorrect errors in 'novel' genes that have codon usage biases that can be recognised. Prot4EST is about to be released in a new all signing version... see http://www.nematodes.org/bioinformatics/prot4EST/index.shtml for version 2; but version 3 is better. Mark On 7 Dec 2009, at 06:25, Yannick Wurm wrote: > Hello all, > > thanks Don for setting up this list. > > I'm just finishing my PhD in Lausanne, Switzerland, working with > some EST data from ants (the closest high quality genomes are the > Nasonia wasp and the Honeybee). > > One issue with 454 data are homopolymer errors (AAAAAAA may become > AAAAAA or AAAAAAAA according to the 454 basecaller). When in a > coding sequence, something like that leads to frameshifts and thus > bad protein models. It should be possible to correct for this kind > of error (and insertions/deletions in EST data in general) by using > alignments obtained from blastx against a database of good proteins. > > Have any list members done this? > > Kind regards, > > Yannick > > > > > -------------------------------------------- > yannick . wurm @ unil . ch > Ant Genomics, Ecology & Evolution @ Lausanne > http://www.unil.ch/dee/page28685_fr.html > > > > _______________________________________________ > Arthropod mailing list > Arthropod@net.bio.net > http://net.bio.net/biomail/listinfo/arthropod > Mark Blaxter mark.blaxter@ed.ac.uk ~ may all beings be happy ~ -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From yannick.wurm from unil.ch Wed Dec 9 12:43:52 2009 From: yannick.wurm from unil.ch (Yannick Wurm) Date: Wed Dec 9 13:20:59 2009 Subject: [Arthropod] Re: EST assembly correction In-Reply-To: <200912081703.nB8H3FM11713@net.bio.net> References: <200912081703.nB8H3FM11713@net.bio.net> Message-ID: <32F32F34-62D6-43FD-BF73-915B5BE00DC2@unil.ch> Thanks for the reply Mark, I'll look into that. best, yannick On 8 Dec 2009, at 18:03, arthropod-request@oat.bio.indiana.edu wrote: > Hi > > Using a 'good' est translator such as prot4est will autocorrect for > these errors, and will also autocorrect errors in 'novel' genes that > have codon usage biases that can be recognised. > Prot4EST is about to be released in a new all signing version... see http://www.nematodes.org/bioinformatics/prot4EST/index.shtml > for version 2; but version 3 is better. > Mark > > On 7 Dec 2009, at 06:25, Yannick Wurm wrote: > >> Hello all, >> >> thanks Don for setting up this list. >> >> I'm just finishing my PhD in Lausanne, Switzerland, working with >> some EST data from ants (the closest high quality genomes are the >> Nasonia wasp and the Honeybee). >> >> One issue with 454 data are homopolymer errors (AAAAAAA may become >> AAAAAA or AAAAAAAA according to the 454 basecaller). When in a >> coding sequence, something like that leads to frameshifts and thus >> bad protein models. It should be possible to correct for this kind >> of error (and insertions/deletions in EST data in general) by using >> alignments obtained from blastx against a database of good proteins. >> >> Have any list members done this? >> >> Kind regards, >> >> Yannick >> >> From yannick.wurm from unil.ch Wed Dec 9 12:49:10 2009 From: yannick.wurm from unil.ch (Yannick Wurm) Date: Wed Dec 9 13:21:00 2009 Subject: [Arthropod] EST assembly from Illumina Message-ID: Hello again, have another question. We're thinking of doing some gene expression analyses for a species on which we have no sequence data and the closest sequenced relative is 100+ million years away. I'm thinking it may be possible to do everything in one shot: 1. RNAseq (using Illumina) on our 2 conditions of interest 2. Assembly of the RNAseq data to get good gene models -> annotate 3. "classic" RNAseq analysis where to identify differential expression The alternative would be to first perform some 454 of a normalized library to get a good overview of the transcriptome. Any experience with this? Do you think it's feasible? Kind regards, Yannick -------------------------------------------- yannick . wurm @ unil . ch Ant Genomics, Ecology & Evolution @ Lausanne http://www.unil.ch/dee/page28685_fr.html From gilbertd from cricket.bio.indiana.edu Wed Dec 9 13:58:17 2009 From: gilbertd from cricket.bio.indiana.edu (Don Gilbert) Date: Wed Dec 9 13:59:09 2009 Subject: [Arthropod] EST assembly from Illumina Message-ID: <200912091858.nB9IwHM21798@cricket.bio.indiana.edu> Yannick, Assembling short read RNA-Seq to full mRNA without a reference genome is harder. If you have longer (72+ bp) mate-paired reads it become easier than with shorter, single reads. Here is some discussion of this http://seqanswers.com/forums/forumdisplay.php?f=27 > We're thinking of doing some gene expression analyses for a species on which we have no sequence data and the > closest sequenced relative is 100+ million years away. > > I'm thinking it may be possible to do everything in one shot: > 1. RNAseq (using Illumina) on our 2 conditions of interest > 2. Assembly of the RNAseq data to get good gene models -> annotate > 3. "classic" RNAseq analysis where to identify differential expression > > The alternative would be to first perform some 454 of a normalized library to get a good overview of the tran > scriptome. Software assemblers for this (which I've not used) include Velvet, newbler, SOAP, probably others. Your expression analysis will depend on having large enough transcript assemblies to distinguish genes. And you can expect a large fraction of differential expression to be in species/clade-specific genes. Another approach, you might ask folks on this list if anyone is sequencing your ant genome, and would like to collaborate w/ your EST/RNA-seq data. - Don Gilbert -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405 -- gilbertd@indiana.edu--http://marmot.bio.indiana.edu/ From gilbertd from net.bio.net Wed Dec 9 14:08:46 2009 From: gilbertd from net.bio.net (Don Gilbert) Date: Wed Dec 9 14:09:10 2009 Subject: [Arthropod] Arthropod genomes in progress? Message-ID: <200912091908.nB9J8kw13251@net.bio.net> Dear folks, To follow up on Yannick's question about problems analyzing ant genes without an ant genome assembly, I also would like to learn what arthropod genomes are now being sequenced, with some prospect of public data access this coming year. I use NCBI's Genomes table of genome projects in progress, http://www.ncbi.nlm.nih.gov/genomes/leuks.cgi But smaller groups are not often signing into this, so we now lack a good view of what genomes may be coming. Do any of you have pointers to other arthropod genomes in progress? I've heard rumors of ant genomes being sequenced. The Daphnia genomicists are working on Daphnia magna (at around 8x cover now but not completely covering the genome), and a postdoc is sequencing a set of Daphnia populations (pulex or related species). We are at the stage now where genome sequencing/assembly/analysis is something a postdoc or research lab does, as often or more often than the sequencing centers. Yet there is much effort needed to turn sequence from machines to into a good genome assembly and analysis. Those groups who want benefits that collaborations can bring should let us know what is coming along. Often when I think I find a phylogenetic or other relationship from available arthropod genomes, another one comes along to confuse or disprove that. For example, Ixodes tick has mostly long introns like vertebrates, unlike other bugs. But then I looked at the silk moth genome, and it also has a preponderance of long introns. Find here a summary of arthropod gene structure statistics such as exon and coding sequence sizes, intron sizes and number of exons, showing this bit about intron size distribution http://insects.eugenes.org/arthropods/summaries/ arthropod-genestruc-table.pdf (with frequency plots in arthropod-genestruc-hist.pdf) -- Don Gilbert -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405 -- gilbertd@indiana.edu--http://marmot.bio.indiana.edu/ From mark.blaxter from ed.ac.uk Wed Dec 9 14:44:14 2009 From: mark.blaxter from ed.ac.uk (Mark Blaxter) Date: Wed Dec 9 14:55:54 2009 Subject: [Arthropod] EST assembly from Illumina In-Reply-To: References: Message-ID: <053BC19C-7B85-494E-99EC-BD9BFA214332@ed.ac.uk> Hi I would recommend (currently) the (a) 454 transcriptome to build reference and (b) SOLEXA RNASeq to refine reference and count transcripts route The technology for transcriptome assembly with illumina SOLEXA is still less-than-robust, but this will change as software such as ABySS gets better at it, Velvet starts to cope with different levels of depth across sequences and, most importantly) we start to be able to get good 100-base reads from paired end RNASeq. An alternative to RNASeq for counting is deepSAGE (NlaIII tags aka), and is often good enough for non-model oragisms where the initial questions are more coarse-grained than those asked of the human genome or model nonvertebrates. deepSAGE also requires fewer reads per sample/ replicate (~1 million compared to ~5 million) and so one gets more 'bang for your buck' in sequencing. Mapping NlaIII tags to 454 transcriptomes works well as the cDNA template for 454 is usually prepared using polyA and thus includes a good representation of 3'ends. Mark On 9 Dec 2009, at 17:49, Yannick Wurm wrote: > Hello again, > > have another question. > We're thinking of doing some gene expression analyses for a species > on which we have no sequence data and the closest sequenced relative > is 100+ million years away. > > I'm thinking it may be possible to do everything in one shot: > 1. RNAseq (using Illumina) on our 2 conditions of interest > 2. Assembly of the RNAseq data to get good gene models -> annotate > 3. "classic" RNAseq analysis where to identify differential > expression > > The alternative would be to first perform some 454 of a normalized > library to get a good overview of the transcriptome. > > Any experience with this? Do you think it's feasible? > > Kind regards, > Yannick > > > -------------------------------------------- > yannick . wurm @ unil . ch > Ant Genomics, Ecology & Evolution @ Lausanne > http://www.unil.ch/dee/page28685_fr.html > > > > _______________________________________________ > Arthropod mailing list > Arthropod@net.bio.net > http://net.bio.net/biomail/listinfo/arthropod > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From murphyte from ncbi.nlm.nih.gov Wed Dec 9 15:22:28 2009 From: murphyte from ncbi.nlm.nih.gov (Murphy, Terence (NIH/NLM/NCBI) [C]) Date: Wed Dec 9 15:30:09 2009 Subject: [Arthropod] Arthropod genomes in progress? In-Reply-To: <200912091908.nB9J8kw13251@net.bio.net> References: <200912091908.nB9J8kw13251@net.bio.net> Message-ID: I second the motion to try and accumulate a more complete list of genomes in progress. Don referred to a partial table at NCBI: http://www.ncbi.nlm.nih.gov/genomes/leuks.cgi This table is generated from the Genome Projects database (http://www.ncbi.nlm.nih.gov/genomeprj), which contains records describing various types of ongoing or finished genome projects. When whole genome, transcriptome, and other types of genome sequencing projects are submitted to GenBank, EMBL, or DDBJ, the submitter is required to also register their project in the Genome Projects database. Thus, the Genome Projects database should have a complete listing of all arthropod genomes with sequence in the public databases. Curators at NCBI used to create genome project records when projects were first started based on information from NHGRI or other sources. Now, in the age of cheap sequencing, we no longer have knowledge of what projects are underway or the time to create projects proactively, so the table of "in progress" genomes is a poor representation of what is currently underway. One option would be for those of you with projects that are nearing completion to go ahead and register your project at NCBI: http://www.ncbi.nlm.nih.gov/genomes/mpfsubmission.cgi This will give you a genome project ID that you will be needing for your submission anyway, and your genome should appear on the "in progress" table above in a few days. I've heard of a lot of projects that are in the works, but I'm sure there are more coming and it would be useful to have a more complete listing of what to expect. These are exciting times! Let me know if you have any questions about the Genome Projects database, or anything else for that matter, and I'd be glad to help. Sincerely, -Terence ----- Terence Murphy, Ph.D. RefSeq Project, Arthropod Genome Champion NCBI/NLM/NIH/DHHS 45 Center Drive, Room 4AS.37D-82 Bethesda, MD 20892-6510 Phone: 00-1-301-402-0990 e-mail: murphyte@ncbi.nlm.nih.gov From manoj.samanta from systemix.org Wed Dec 9 16:19:51 2009 From: manoj.samanta from systemix.org (manoj.samanta@systemix.org) Date: Wed Dec 9 17:03:21 2009 Subject: [Arthropod] Arthropod genomes in progress? In-Reply-To: References: <200912091908.nB9J8kw13251@net.bio.net> Message-ID: <55364.71.112.31.55.1260393591.squirrel@www.systemix.org> Should I add any more? http://www.manojlabs.com/content/insect-genomes From mark.blaxter from ed.ac.uk Wed Dec 9 16:28:48 2009 From: mark.blaxter from ed.ac.uk (Mark Blaxter) Date: Wed Dec 9 17:03:22 2009 Subject: [Arthropod] Arthropod genomes in progress? In-Reply-To: References: <200912091908.nB9J8kw13251@net.bio.net> Message-ID: Hi Terence thanks for the info A couple of comments and questions - I have submitted many EST datasets in the past (standard Sanger ones) and havent been 'required' by NCBI dbEST to 'register' the 'genome project' they derive from. Is this a new rule, or is it an aspiration? - We have Illumina, Roche and AB instruments, and our user base is requesting 'whole' transcriptome and genome data generation ever more frequently, so it will be good to get a registry going. However, waiting till the data are submitted to GenBank/EMBL/DDBJ may not be what the community needs - I think I'd like to have a registry of genomes-in-progress and genomes-in-aspiration, so we can collaborate in data generation (someone might be doing a genome for the transcriptome I am generating, etc), data analysis (someone might be interested in the clade my genome is from) and thus copublication/web presentation. Is that what you meant Don? - a question... the 'big' (or maybe thats 'vast' now!) genome centres are part of a global collaborative, and thus have their genomes-in- progress and genomes-in-waiting fasttracked to NCBI and other www sites. Ive tried to contact them via http://www.intlgenome.org/ but had no reply. Now that we are able to generate >10 Gbase/week of raw data in our centre alone, it would be good to open the club a little, no? Mark On 9 Dec 2009, at 20:22, Murphy, Terence (NIH/NLM/NCBI) [C] wrote: > I second the motion to try and accumulate a more complete list of > genomes in progress. Don referred to a partial table at NCBI: > http://www.ncbi.nlm.nih.gov/genomes/leuks.cgi > > This table is generated from the Genome Projects database > (http://www.ncbi.nlm.nih.gov/genomeprj), which contains records > describing various types of ongoing or finished genome projects. When > whole genome, transcriptome, and other types of genome sequencing > projects are submitted to GenBank, EMBL, or DDBJ, the submitter is > required to also register their project in the Genome Projects > database. > Thus, the Genome Projects database should have a complete listing > of all > arthropod genomes with sequence in the public databases. > > Curators at NCBI used to create genome project records when projects > were first started based on information from NHGRI or other sources. > Now, in the age of cheap sequencing, we no longer have knowledge of > what > projects are underway or the time to create projects proactively, > so the > table of "in progress" genomes is a poor representation of what is > currently underway. > > One option would be for those of you with projects that are nearing > completion to go ahead and register your project at NCBI: > http://www.ncbi.nlm.nih.gov/genomes/mpfsubmission.cgi > > This will give you a genome project ID that you will be needing for > your > submission anyway, and your genome should appear on the "in progress" > table above in a few days. > > I've heard of a lot of projects that are in the works, but I'm sure > there are more coming and it would be useful to have a more complete > listing of what to expect. These are exciting times! > > Let me know if you have any questions about the Genome Projects > database, or anything else for that matter, and I'd be glad to help. > > Sincerely, > > -Terence > > ----- > Terence Murphy, Ph.D. > RefSeq Project, Arthropod Genome Champion > NCBI/NLM/NIH/DHHS > 45 Center Drive, Room 4AS.37D-82 > Bethesda, MD 20892-6510 > Phone: 00-1-301-402-0990 > e-mail: murphyte@ncbi.nlm.nih.gov > > _______________________________________________ > Arthropod mailing list > Arthropod@net.bio.net > http://net.bio.net/biomail/listinfo/arthropod > Professor Mark Blaxter mark.blaxter@ed.ac.uk http://www.nematodes.org/ Institute of Evolutionary Biology University of Edinburgh EH9 3JT -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From murphyte from ncbi.nlm.nih.gov Thu Dec 10 08:47:18 2009 From: murphyte from ncbi.nlm.nih.gov (Murphy, Terence (NIH/NLM/NCBI) [C]) Date: Thu Dec 10 09:21:35 2009 Subject: [Arthropod] Arthropod genomes in progress? In-Reply-To: <55364.71.112.31.55.1260393591.squirrel@www.systemix.org> References: <200912091908.nB9J8kw13251@net.bio.net> <55364.71.112.31.55.1260393591.squirrel@www.systemix.org> Message-ID: Here's a list of finished and in progress genomes listed on either http://www.intlgenome.org/ or in NCBI Genome Projects: Acyrthosiphon pisum Aphid, Pea Aedes aegyptii Mosquito Anopheles gambiae Mosquito Apis mellifera Honey bee Bombyx mori silkworm Culex quinquefasciatus Mosquito Drosophila ananassae Fruitfly Drosophila erecta Fruitfly Drosophila grimshawi Fruitfly Drosophila melanogaster Fruitfly Drosophila mojavensis Fruitfly Drosophila persimilis Fruitfly Drosophila pseudoobscura Fruitfly Drosophila sechellia Fruitfly Drosophila simulans Fruitfly Drosophila virilis Fruitfly Drosophila willistoni Fruitfly Drosophila yakuba Fruitfly Ixodes scapularis Tick Nasonia giraulti Wasp, Parasitoid Nasonia longicornis Wasp, Parasitoid Nasonia vitripennis Wasp, Parasitoid Pediculus humanus corporis body louse Rhodnius prolixus Kissing bug Tribolium castaneum Beetle, Red Flour Apis florea Dwarf honey bee Bicyclus anynana squinting bush brown butterfly Bombus terrestris Bumble bee Cochliomyia hominivorax primary screw-worm Daphnia pulex common water flea Diaphorina citri Asian citrus psyllid Drosophila albomicans Fruitfly Drosophila mauritiana Fruitfly Exopalaemon modestus freshwater shrimp Haematobia irritans horn fly Helicoverpa armigera Cotton bollworm Jassa slatteryi Amphipod crustacean Lepeophtheirus salmonis salmon louse Limulus polyphemus Atlantic horseshoe crab Lutzomyia longipalpis Sandfly Mayetiola destructor Hessian fly Phlebotomus papatasi Sandfly-Leishmania vector Psammotermes termites Strigamia maritima Centipede Tetranychus urticae red spider mite Varroa destructor honeybee ectoparasitic mite I know of at least 20 that aren't on these lists, but I'm not at liberty to list them without permission. But hopefully this is a helpful start. -Terence ----- Terence Murphy, Ph.D. RefSeq Project, Arthropod Genome Champion NCBI/NLM/NIH/DHHS From murphyte from ncbi.nlm.nih.gov Thu Dec 10 09:08:31 2009 From: murphyte from ncbi.nlm.nih.gov (Murphy, Terence (NIH/NLM/NCBI) [C]) Date: Thu Dec 10 09:21:36 2009 Subject: [Arthropod] Arthropod genomes in progress? In-Reply-To: References: <200912091908.nB9J8kw13251@net.bio.net> Message-ID: Hi Mark, Regarding EST datasets, transcriptome projects based on conventional Sanger-based EST sequencing aren't required to be in the Genome Projects database. I thought all SRA-based projects were required to be in Genome Projects, but upon further checking it appears that is also optional. Sorry for the confusion. I'm not sure who maintains the http://www.intlgenome.org/ page, but it is likely to be quite out of date even for the big centers since the last species was added in Sept-2008. Maybe there's a suitable wiki page that could be used to maintain a user-editable table of ongoing projects? For those of you interested in what Arthropod transcriptome datasets are currently in the NCBI SRA database, you can use this NCBI Entrez query: arthropoda[orgn] AND "biomol transcript"[Properties] http://tinyurl.com/yhkc3dw 10 species currently have transcript data in SRA: Zygaena filipendulae, species, moths Heliconius melpomene malleti, subspecies, butterflies Heliconius melpomene cythera, subspecies, butterflies Microctonus aethiopoides, species, wasps &c. Melitaea cinxia (Glanville fritillary), species, butterflies Anopheles stephensi (Asian malaria mosquito), species, flies Glossina morsitans, species, flies Drosophila melanogaster (fruit fly), species, flies Manduca sexta (tobacco hornworm), species, moths Locusta migratoria (migratory locust), species, grasshoppers -Terence -----Original Message----- From: Mark Blaxter [mailto:mark.blaxter@ed.ac.uk] Sent: Wednesday, December 09, 2009 4:29 PM To: Murphy, Terence (NIH/NLM/NCBI) [C] Cc: arthropod@magpie.bio.indiana.edu Subject: Re: [Arthropod] Arthropod genomes in progress? Hi Terence thanks for the info A couple of comments and questions - I have submitted many EST datasets in the past (standard Sanger ones) and havent been 'required' by NCBI dbEST to 'register' the 'genome project' they derive from. Is this a new rule, or is it an aspiration? - We have Illumina, Roche and AB instruments, and our user base is requesting 'whole' transcriptome and genome data generation ever more frequently, so it will be good to get a registry going. However, waiting till the data are submitted to GenBank/EMBL/DDBJ may not be what the community needs - I think I'd like to have a registry of genomes-in-progress and genomes-in-aspiration, so we can collaborate in data generation (someone might be doing a genome for the transcriptome I am generating, etc), data analysis (someone might be interested in the clade my genome is from) and thus copublication/web presentation. Is that what you meant Don? - a question... the 'big' (or maybe thats 'vast' now!) genome centres are part of a global collaborative, and thus have their genomes-in- progress and genomes-in-waiting fasttracked to NCBI and other www sites. Ive tried to contact them via http://www.intlgenome.org/ but had no reply. Now that we are able to generate >10 Gbase/week of raw data in our centre alone, it would be good to open the club a little, no? Mark From darren.obbard from ed.ac.uk Thu Dec 10 09:37:16 2009 From: darren.obbard from ed.ac.uk (Darren Obbard) Date: Thu Dec 10 09:47:06 2009 Subject: [Arthropod] Arthropod genomes in progress? | more diptera In-Reply-To: References: <200912091908.nB9J8kw13251@net.bio.net> Message-ID: <4B21079C.6070100@ed.ac.uk> Hi, I'm sure most people reading this list already know this but in case there are people who don't, another useful list (I think - I await correction) can be found here: http://www.genome.gov/10002154 This might be useful because it lists approved targets that have not yet resulted in data. Notable dipterans are: Drosophila biarmipes Drosophila bipectinata Drosophila elegans Drosophila eugracilis Drosophila ficusphila Drosophila kikkawai Drosophila rhopaloa Drosophila takahashii Anopheles arabiensis An. arabiensis An. merus An. epiroticus An. stephensi An. maculatus An. funestus An. minimus s.s. An. culicifacies An. farauti An. dirus s.s. An. atroparvus An. albimanus -- Darren Obbard Institute of Evolutionary Biology University of Edinburgh, UK darren.obbard@ed.ac.uk http://www.biology.ed.ac.uk/research/groups/obbard/ The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From gilbertd from cricket.bio.indiana.edu Thu Dec 10 09:46:34 2009 From: gilbertd from cricket.bio.indiana.edu (Don Gilbert) Date: Thu Dec 10 09:47:08 2009 Subject: [Arthropod] Arthropod genomes in progress? Message-ID: <200912101446.nBAEkYx25305@cricket.bio.indiana.edu> Here is some more news on coming arthropod genomes. Nicole Gerardo says that there are ant genomes in progress, in her collaboration and others: - fungus-growing ants and their associated microbes. draft assembly of Atta cephalotes, a leaf cutter ant some data from harvester ants and argentine ants. - The fire ant is also being sequenced. ..it seems like multiple ant genome drafts could be available within a year. Maybe Yannick would find one of these help with EST/RNA-seq analyses, and vice versa (having many transcripts to analyze your draft genome is a good idea). Meg Allen says a group is organizing to sequence Diabrotica (corn rootworm), see the ESA symposium Tuesday in Indianapolis Convention Center, room 109-110. Meg also reminds us that the Entomological Society meeting this weekend/next week in Indianapolis is a good place to pick up such information. Ie, see last month's post http://www.bio.net/bionet/mm/arthropod/2009-November/000003.html or http://www.entsoc.org/am/cm/index.htm - Don Gilbert -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405 -- gilbertd@indiana.edu--http://marmot.bio.indiana.edu/ From dmerrill from ksu.edu Fri Dec 11 14:18:16 2009 From: dmerrill from ksu.edu (Doris Merrill) Date: Fri Dec 11 14:22:39 2009 Subject: [Arthropod] Invitation to join the Arthropod Genomics Consortium Message-ID: Participants at the annual symposium sponsored by the Arthropod Genomics Center at Kansas State University formed the Arthropod Genomics Consortium to increase collaboration and information exchange among the community of scientists performing genomic studies on arthropods. We are developing a wiki, at http://arthropodgenomes.org , to help the community self-assemble. Our goal is to provide a central location for information about arthropod genomics projects, bioinformatics tools, and people interested in arthropod genomics. We think this is a great way to "identify the community"; information from the wiki can be used to support white papers and grant proposals. It will also serve as a central location for links to arthropod genomics resources, news releases and meeting announcements. We have constructed the initial wiki pages, which include templates for entering your information. Please take a moment to visit the wiki, http://arthropodgenomes.org , register and enter information about yourself and the organism or problems you study. Even if you don't work directly on a genome project, we would like to know how you would benefit from genome level knowledge. If you do work on a genome project, please list the genomic resources that currently exist for your organism(s) and what resources are needed or planned. Please encourage your colleagues to participate. To organize such a large area, we have set up four main working groups based on research interest: 1) plant pests and beneficials, 2) vectors of disease, 3) Evo/Devo, and 4) EcoGen/PopGen. Your interests or research organism may fall into multiple categories. If so, you can list more than one. Sue Brown, Director K-State Arthropod Genomics Center for the Arthropod Genomics Consortium by Doris R. Merrill, dmerrill@k-state.edu Program Coordinator Arthropod Genomics Center, www.k-state.edu/agc Kansas State University, Division of Biology, 318 Ackert Manhattan, KS 66506-4901 Phone: (785) 532-3482, Fax: (785) 532-6653 Plan to attend the 4th Annual Arthropod Genomics Symposium June 10 to 13, 2010, in Kansas City! Details are available at www.ksu.edu/agc From yannick.wurm from unil.ch Mon Dec 14 05:08:14 2009 From: yannick.wurm from unil.ch (Yannick Wurm) Date: Mon Dec 14 09:14:25 2009 Subject: [Arthropod] Re: EST assembly from Illumina In-Reply-To: <200912101421.nBAELfM13903@net.bio.net> References: <200912101421.nBAELfM13903@net.bio.net> Message-ID: <4FE595AF-3F4F-40C0-BC5F-9A1AA761215B@unil.ch> Hi Don & Mark, thanks very much for your extensive replies. DeepSage on Illumina may indeed be a viable approach. There are indeed a few ant sequencing projects underway. However I think we are involved or at least know about most of them (the ant community is small). Yet only one group is currently working on the species of interest and they haven't done much molecular at all yet. Kind regards, yannick -------------------------------------------- yannick . wurm @ unil . ch Ant Genomics, Ecology & Evolution @ Lausanne http://www.unil.ch/dee/page28685_fr.html Don wrote: > Yannick, > > Assembling short read RNA-Seq to full mRNA without a reference > genome is harder. If you have longer (72+ bp) mate-paired reads > it become easier than with shorter, single reads. Here is some > discussion of this > http://seqanswers.com/forums/forumdisplay.php?f=27 > >> We're thinking of doing some gene expression analyses for a species on which we have no sequence data and the >> closest sequenced relative is 100+ million years away. >> >> I'm thinking it may be possible to do everything in one shot: >> 1. RNAseq (using Illumina) on our 2 conditions of interest >> 2. Assembly of the RNAseq data to get good gene models -> annotate >> 3. "classic" RNAseq analysis where to identify differential expression >> >> The alternative would be to first perform some 454 of a normalized library to get a good overview of the tran >> scriptome. > > Software assemblers for this (which I've not used) > include Velvet, newbler, SOAP, probably others. > > Your expression analysis will depend on having large enough > transcript assemblies to distinguish genes. And you can expect > a large fraction of differential expression to be in species/clade-specific > genes. > > Another approach, you might ask folks on this list if anyone > is sequencing your ant genome, and would like to > collaborate w/ your EST/RNA-seq data. > > - Don Gilbert > -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405 > -- gilbertd@indiana.edu--http://marmot.bio.indiana.edu/ Mark wrote: > Hi > > I would recommend (currently) the > (a) 454 transcriptome to build reference and > (b) SOLEXA RNASeq to refine reference and count transcripts > route > > The technology for transcriptome assembly with illumina SOLEXA is > still less-than-robust, but this will change as software such as ABySS > gets better at it, Velvet starts to cope with different levels of > depth across sequences and, most importantly) we start to be able to > get good 100-base reads from paired end RNASeq. > > An alternative to RNASeq for counting is deepSAGE (NlaIII tags aka), > and is often good enough for non-model oragisms where the initial > questions are more coarse-grained than those asked of the human genome > or model nonvertebrates. deepSAGE also requires fewer reads per sample/ > replicate (~1 million compared to ~5 million) and so one gets more > 'bang for your buck' in sequencing. Mapping NlaIII tags to 454 > transcriptomes works well as the cDNA template for 454 is usually > prepared using polyA and thus includes a good representation of 3'ends. > > Mark From gilbertd from net.bio.net Wed Dec 16 10:21:22 2009 From: gilbertd from net.bio.net (Don Gilbert) Date: Wed Dec 16 10:23:09 2009 Subject: [Arthropod] New genome informatics for arthropod genomicists, talk slides Message-ID: <200912161521.nBGFLMt26090@net.bio.net> Dear folks, Find here for those who may be interested an outline in slide format of my current genome informatics methods as applied to various new arthropod genomes. New genome informatics for insect [arthropod] genomicists by Don Gilbert, Biology Dept., Indiana University talk at Entomological Society of America, 14 Dec 2009 http://wfleabase.org/docs/all-docs/esangi0912-new-genome-informatics.pdf Overview 1. Current Genome Annotation Recipe 2. ESTs give essential genome gene-set .. but not all genes found by ESTs .. EST and gene structure statistics of selected arthropod genomes 3. Next-gen base-level expression measures gene expression better .. Tiling/RNA-seq Expression Recipe Tiling and RNA-seq find many new, weakly expressed genes 4. RNA-Seq finds gene structures well .. RNA-Seq Genes Recipe .. the new EST? the new gene finder? Maybe You can find here the answers (if not explicitly) to these questions I posed a few weeks back: > The species include arachnid tick Ixodes, the crustacean waterflea Daphnia, > and 4 insects: Bombyx silk moth, Drosophila fruitfly, Nasonia wasp > and Acyrthosiphon aphid. You can find some extreme results among these: > > - Which of these have official gene models that missed over 50% > of EST assemblies? > > - Which of these have 3 times more ESTs split between scaffolds, and > a perhaps corresponding missed coding exon per gene model, on average? > - Don Gilbert -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405 -- gilbertd@indiana.edu--http://marmot.bio.indiana.edu/ From semrich from nd.edu Sat Dec 19 18:10:00 2009 From: semrich from nd.edu (Scott Emrich) Date: Sat Dec 19 19:05:46 2009 Subject: [Arthropod] Arthropod genomes in progress? In-Reply-To: <4B21079C.6070100@ed.ac.uk> References: <200912091908.nB9J8kw13251@net.bio.net> <4B21079C.6070100@ed.ac.uk> Message-ID: <9AC8F7D2-9D27-4F2A-8FE3-94A07E9521EE@nd.edu> Hi all, There is a growing consortium (including VectorBase that I am a co-PI and scientific manager of) to keep track of this using a wiki. Details here: http://arthropodgenomes.org. This is also tied to the effort at Kansas State that hosts the Arthropod Genomics symposium that is another good place to learn this info, and a lot of the folks involved in this list so far are either members of the steering committee or advisors to it. On the side of vectors, we expect the body louse (Pediculus humanus) to go live early next year. There will be 14 Anopheles sequenced and a few more in the works. The europeans will sequence Aedes albopictus. Two sandflies have been started. Black fly is often mentioned but I don't have details. I also know a variety of Lepidoptera are in various stages of completion, but I doubt if those groups will be entirely willing to share with the larger community. I don't even know for sure which ones are in the pipeline besides those mentioned at ESA. - S