IUBio GIL .. BIOSCI/Bionet News .. Biosequences .. Software .. FTP

[Arthropod] EvidentialGene update, improves Anopheles mosquitoes and other animal/plant gene sets over recipe genomics

Don Gilbert via arthropod%40net.bio.net (by gilbertd from net.bio.net)
Fri Feb 26 09:28:27 EST 2016

Dear genomics folks,

Re: EvidentialGene project at  http://eugenes.org/EvidentialGene/

EvidentialGene has a high accuracy rate for gene set construction, compared
with other gene informatics methods, for fish, plants, various arthropods.

Recently I've generated gene sets for two Anopheles mosquito species
with Evigene mRNA assembly, and they surpass recently published* gene
sets from Vectorbase project in orthology completeness, using same
RNA-seq as that project reports.

The software pipeline pair of MAKER and Trinity form a common recipe now for
genome biologists, without those scientists realizing that greater accuracy is
possible and not much harder to obtain, I suspect.  In all cases where I test,
with fishes, plants, insects, Evigene is producing the notably more accurate
and ortho-complete gene sets.   See below for mosquitos, fishes at 

The EvidentialGene gene reconstruction methods have been used for
several animal and plant genome  projects, where they produce gene sets
more accurate than those of peer annotation methods. There are basic
reasons these methods have high accuracy: careful, complete assembly of
the now highly accurate RNA-sequences, and extensive use of protein
orthology testing to validate, reject or accept, alternate gene
constructions.   Assembly of RNA sequences is similar but simpler than
of genomic DNA, as RNA-seq read sizes are near to gene transcript sizes,
there are no repetitive transposons, nor problematic intron breaks.  
Accurate RNA assembly solves problems that exist for traditional genome
gene-modelling: artifacts from draft genome assemblies, from modelling
prediction algorithms that are not gene-level accurate, and from
artifacts contributed by related species gene models.

Improvements to the Evigene locus classifier, including chromosome-assembly
map classifier, are producing better discrimination of alternate
transcripts versus paralog genes. I hope to offer an update in coming
months that (a) improves gene locus classification (removing some
duplication, improving alternate transcript classification), and (b)
offering an initial mRNA-assembly by chromosome assembly classifier
(i.e. genome mapping of transcript assemblies).

If you have interests in accurate animal and plant gene-ome construction from
RNA sequences, with or without a chromosome assembly, this project may be of
interest. I would like to work with a few collaborators who have genome +
transcriptome data sets plus genome-modelled gene sets (e.g. from pipelines
such as MAKER, NCBI, Augustus, EvidenceModeller, etc) to compare with
EvidentialGene results.

Don Gilbert, 2016.feb

* Evigene vs MAKER gene set of doi: 10.1126/science.1258522
  Highly evolvable malaria vectors:the genomes of 16 Anopheles mosquitoes

Protein homology to reference genes, 2 gene sets for 2 species of
Anopheles mosquito. For both species published RNA-seq was assembled
with 4 gene assemblers, then reduced to locus/alternate gene sets with
Evigene (roughly 3 days work).  The RNA data sets here were too small by
half of recommended amount, so some genes did not assemble properly. 
With 100+ M read pairs instead of the 50 M provided, the completeness of
Evigene sets would be improved.

    Highly conserved REFERENCE (BUSCO drosmel,  nr=3038)
     Anopheles-funestus Anopheles-albimanus
       Evigene  MAKER     Evigene   MAKER 
found  99.4%    97.7%     98.3%     97.3%  
align  87.3%    83.2%     87.3%     83.2%
best   30%      11.8%     26.5%     12.6%  
 equal      58%                 61%

    Drosophila mel. model REFERENCE (nr=10902)
     Anopheles-funestus Anopheles-albimanus
       Evigene  MAKER     Evigene   MAKER   
found  98.4%    96.1%     95.8%     95.8% 
align  87.3%    83.2%     77.5%     76.8%
best   31.6%    15.1%     28.6%     18.6%   
  equal     58%                 53%

    Anopheles gambia REFERENCE (tr total=14870, locus total=12994)
     Anopheles-funestus Anopheles-albimanus
       Evigene  MAKER     Evigene   MAKER 
found  97.9%    96.6%     94.7%     96.3% 
align  93.1     89.3      86.4%     87.5%
best   33.9%    16%       30.7%     21.2%  
 equal      50%                 48%

More information about the Arthropod mailing list

Send comments to us at archive@iubioarchive.bio.net