New FGENESV - Finding Genes in genomes of RNA and DNA viruses
program is available for on line usage at:
http://www.softberry.com/berry.phtml?topic=gfindv
Method description:
The FGENESV algorithm is based on pattern recognition of different types of
signals and Markov chain models of
coding regions. Optimal combination of these features is then found by
dynamic programming and a set of gene
models is constructed along given sequence.
FGENESV is the fastest ab initio viral gene prediction program available.
We have developed 2 variants of gene prediction: FGENESV0 (good to apply for
small genomes < 10000 bp) uses
generic parameters of coding regions and FGENESV learns genome-specific
parameters just from input viral genome
sequence.
FGENESV predicts all intron-less genes of viruses. However a few % of viral
genes contain intron sequences.
Such genes often are alternatives to the intron-less variant. Please use
standard eukaryotic gene finding
programs (such as FGENESH) additionally to FGENESV to find such genes.
As additional parameters you can choose Linear or Circular form of your
virus and select alternative genetic
code (Standard code is default): The Bacterial and Plant Plastid Code
(transl_table=11) or The Mold, Protozoan,
and Coelenterate Mitochondrial Code and the Mycoplasma/Spiroplasma Code
(transl_table=4)
FgenesV output:
FGENESV: Prediction of potential genes in viral genomes
Time: Tue Oct 22 16:17:25 2002
Seq name: NC_001838 Common chimpanzee papillomavirus 1, complete genome.
Length of sequence - 7889 bp
Number of predicted genes - 8
N S Start End Score
1 + CDS 101 - 559 693
2 - CDS 551 - 907 232
3 + CDS 840 - 2786 3253
4 + CDS 2728 - 3858 938
5 + CDS 3901 - 4185 298
6 + CDS 4195 - 4335 131
7 + CDS 4371 - 5759 2263
8 + CDS 5746 - 7251 1943
Predicted protein(s):
>GENE 1 101 - 559 152 aa, chain +
MESVNASTPAKTIDQLCKDCNLCMHSLQILCVFCKKTLSTAAAEVYSFEYKDLYIVWRGN
FPFAACAYCLELQGKVNQYRHFDYAAYAVTVEEETNKSIFDIRIRCYLCHKPLCAVEKVR
HILEKARFIKLNCEWKGRCFHCWTSCMENILP
>GENE 2 551 - 907 118 aa, chain -
MASTKNHPEHPVPSLSVPVSSAILLKFVEHTTGTLYSMSPAASCVGVECQVLYTPQPDAR
CYYSDHNWSLFGKLVGFVAWLAWLAWLVRPPHLLSCLIAHCNVDLQGQDSGVTQCPLR
>GENE 3 840 - 2786 648 aa, chain +
MADDTGTDNEGTGCSGWFLVEAIVDKTTGEQVSDDEDETVEDSGLDMVDFIDDRPITHNS
LEAQALLNEQEADAHYAAVQDLKRKYLGSPYVSPLGHIEQSVDCDISPRLDAIQLSRKPK
KVKRRLFQSREITDSGYGYSEVETATQVERYGEPENGCGGGGDGREKEGEGQVHTEVHTE
SEIEQHTGTTRVLELLKCKDVRATLHGKFKECYGLSFKDLTREFKSDKTTCGDWVVAGFG
VHHSVSEAFQKLIQPLSTYSHIQWLTNYKCMGMVLLVLLRFKVNKNRCTVARTLATLLNI
PEDHMLIEPPKIQSSVAALYWFRTSISNASIVTGDTPEWIARQTIVEHGLADNQFKLTEM
VQWAYDNDYCDESDIAFEYAQRADFDSNAKAFLNSNCQAKYVKDCATMCKHYKNAEMKKM
SIKQWIKYRSNKIDETGNWKPIVQFLRHQGIEFISFLSKLKLWLHGTPKKNCIAIVGPPD
TGKSAFCMSLIKFLGGTVISYVNSSSHFWLQPLCNAKVALLDDATQSCWGYMDTYMRNLL
DGNPMSIDRKHKSLALIKCPPLLVTSNIDITTEERYKYLYSRVTLFKFPNPFPFDSNGNA
VYELCDANWKCFFARLSASLDIQDSEDEDDGDTSQAFRCVPGTVVRTV
>GENE 4 2728 - 3858 376 aa, chain +
METLAKHLDACQEQLLELYEENSNELKKHIQHWKCVRYENVLLHKARQMGISHIGPQVVP
PLQVSQTKGHEAIEMQMRIETLLKSQFGMEPWTLQDTSFEMWLTPPKHCFKKQGKTVEVK
YDCNAENTMHYVLWKYIYVYNTEKEIWLKVKGMVDYKGLYYMMEQCKTYYVDFEKEAKQY
GKTLQWEVCFDSTVICSPASVSSTVQEVSNAGPTSYSTTLAQATYTVPSSVSEECVQAPP
SKRQRGPSQSAGKTQHTCNIVCDTDCATLDSANNNINNNSYSSNNGRNNSYCTGTPIVQL
QGDSNNLKCFRYRLHSNYKHLFFACISTWHWTASSNSPKTAIVTLTYVNEQQRQEFLNTV
KIPGTITHKLGFVAIM
>GENE 5 3901 - 4185 94 aa, chain +
MELQVVPVDVTTTTTNASLLPLLIALTVCLISIILLVFVSEFVIYSSVLVLTLLIYLLLW
LLLTTHLQFYLLTLSLCFIPAFSVHQYILQTQQL
>GENE 6 4195 - 4335 46 aa, chain +
MLTCSFDDGDTWLLLWLLASLIVAILGLLLLYLKAVHIHSHSCCSK
>GENE 7 4371 - 5759 462 aa, chain +
MAHSRPRRRKRASATQLYQTCKASGTCPDIIPKVEQNTLADKILKWGSLGVFFGGLGIGT
GSGTGGRTGYVPLESAPRPAIPFGPTARPPIVVDTVGPTDSSIVSLVEDSAIINSGASDL
VPSIHGGFEISTSESTTPAILDVSITTHNTTSTSIFRNPAFAEPSIVQSQPSVEAGGHLL
TSTFTSTISPHSVEEIPLDTFIVSSSNSNPASSTPVPTTVARPRLGLYSKALHQVQVTDP
AFLSSPQRLITFDNPVYEGEDISLHFEHNSIHEPPNEAFMDIIRLHRPAITSRRGVVRFS
RIGQRGSMYTRSGKHIGGRVHFFTDISPISADAQDIELQPLVAAAQDDSDLFDIYVDPDT
TPVAVDNIPSANSTLFIKSSIFDTSWGNTTIPLSLPNNIFVQPGPDILFPTTPAVPPYGP
VISPLPVGPVFISGSEFYLHPSLYFARKRRKRVSLFFSDVAA
>GENE 8 5746 - 7251 501 aa, chain +
MWRPSDNKLYVPPPAPVSKVLTTDAYVTRTKIFYHASSSRLLAVGNPYFPIRKANKTIVP
KVSGFQFRVFKIVLPDPNKFALPDTSIFDSTSQRLVWACIGLEVGRGQPLGVGYCGHPCL
NKFDDVENSASYAVNPGQDNRVNVAMDYKQTQLCLVGCAPPLGEHWGKGKQCSGVSVQDG
DCPPLELVTSVIQDGDMVDTGFGAMDFAELQSNKSDVPLDICTSTCKYPDYLQMAADPYG
DRLFFYLRKEQMFARHFFNRAGTVGEQIPDELFVKGTTSRATVSSNIYFNTPSGSLVSSE
AQLFNKPYWLHKAQGHNNGICWGNTLFVTVVDTTRSTNMTVCASTTSSPSATYTASEYKQ
YMRHVEEFDLQFIFQLCTIKLTAELMAYIHTMNPTVLEEWNFGLSPPPNGTLEDTYRYVQ
SQAITCQKPTPDKEKQDPYAGLSFWEVNLKEKFSSELEQYPLGRKFLLQTGVQSTSLARA
GTKRAASTSTATPTRKKVKRK
---