Genes/Operons in pathogenic organisms: Mycobacterium tuberculosis, Yersinia
pestis and others
Applying Softberry fgenesB-annotator script that predicts genes and find
similar
proteins in public databases, we present annotations for several pathogenic
organisms
at:
http://www.softberry.com/berry.phtml?topic=fgenesb_ann
Mycobacterium tuberculosis H37Rv, complete genome
Mycobacterium tuberculosis CDC1551, complete genome
Yersinia pestis strain CO92, complete genome
Yersinia pestis KIM, complete genome
Bacillus anthracis A2012 main chromosome
Example of annotation of Yersinia pestis KIM
Prediction of potential operons and genes in microbial genomes
Time: Mon Nov 18 11:07:36 2002
Seq name: gi|22123922|ref|NC_004088.1| Yersinia pestis KIM, complete genome
Length of sequence - 4600755 bp
Number of predicted genes - 4011, with homology - 3927
Number of transcription units - 2364, operons 799
N Tu/Op Conserved S Start End Score
pairs(N/Pv)
1 1 Op 1 2/0.311 - CDS 21 - 461 375 ## COG0716
Flavodoxins
2 1 Op 2 . - CDS 554 - 1015 362 ## COG1522
Transcriptional regulators
3 2 Tu 1 . + CDS 1185 - 2177 1148 ## COG2502
Asparagine synthetase A
New FgenesB is the fastest (E.coli genome analyzed in ~14 sec) and most
accurate ab initio Bacterial gene prediction program available.
http://www.softberry.com/berry.phtml?topic=fgenesb
It uses parameters learned for different bacteria by FgenesB-train script,
which input is just new bacterial sequence. It will automatically create
file with gene prediction parameters for the analyzed organism.
It takes only ~10 minutes to create such file for such genome as
E.coli using its sequence. If you need parameters for your new bacteria,
please contact Softberry Inc., we can include them in the WEB list.
Algorithm based on pattern recognition of different types of signals
and Markov chain models of coding regions. Optimal combination of these
features is then found by dynamic programming and a set of gene models
is constructed along given sequences.
In the current FgenesB version operon prediction model is realized
based on gene distances. It can recognize accurately 70% of single
transcription units and define exactly about 43% of operons (~92%
partially).
---