Dear genomicists and genome informaticians,
I've been working on practical use of gene prediction/annotation methods
for arthropods for over 6 years now, and gotten to the point where this
recipe works well, and can be generally useful for eukaryote genome
Evidence Directed Gene predictions for Eukaryotes
This is similar to others' genome annotation protocols,
but differs in some of the gory details. In particular, it is designed
for parallel genome analyses on computer grids and clusters, it handles
the large volumes of RNA-seq data now being churned out (sort of, still
working to improve aspects), and it has a specially designed
"Best Evidential Gene" picking algorithm included as a last step in
combining/filtering gene prediction sets.
This is not yet a pipeline package, not something that others can run as
a black box, where you feed in data and it spits out shiny new
annotated gene sets. But it is a recipe with detailed steps that
work for the eukaryotes I've tested it on. A few colleagues
have given this recipe a beginning try; with some effort it
appears that others can use this.
So far we have applied it to Daphnia pulex and Daphnia magna, Pea
aphid, Nasonia jewel wasp, and Theobroma cacao (the chocolate tree,
where this worked well).
Just now there is one fully worked public example at above URL,
for Pea aphid. Final gene models for aphid are still in
progress, from lots of new rna-seq evidence.
The documentation is in a rough state. This document has gory
details for pea_aphid annotations,
This will perhaps be of most use now to those of you who already
are doing informatics of genome annotations, and know some of these
methods. There may be parts of this you can incorporate or
learn from. If you have interests in this, feel free to contact
Don Gilbert, gilbertd at indiana edu
-- gilbertd from indiana.edu--http://marmot.bio.indiana.edu/