New Prediction of Variants of gene structure FGENES-M
======================================================
Fgenes-M variant for mammalian sequences is available at CGG WEB site:
http://genomic.sanger.ac.uk/gf/gf.html
There are 2 reasons to predict several sub-optimal variants of
gene structure (instead of only one):
1) Gene prediction algorithms for long genomic sequences are
just 70-80% accurate in average, therefore a real structure
might have the score slightly lower than the produced optimal
variant (and you will never see it for such case having
just 1 prediction);
2) Mammalian genes often have alternative splicing and your
sequenced mRNA might not correspond to the predicted variant
(in this case actually several gene structures are real).
There are thousands of alternative gene structures is possible
to generate and currently does not exist established way to
generate variants exactly corresponding to the real ones.
Fgenes-M variant (fgenem) was proved to be useful in helping provide a set of
possible gene structures for further experimental testing in
commercial gene hunting, therefore I decided to put it to WWW.
FGENES-M 1.5 - Pattern based Human Multiple variants of Gene structure
prediction</b>
Algorithm outputs several suboptimal variants of predicted gene structure.
In the current WWW server variant up to 10 structures of gene of multiple genes
is
provided.
It is similar with FGENES and based on pattern recognition of different
types of exons,
promoters and polyA signals and
by dynamic programming finding the optimal combination of them
constructing a set of gene models along a given sequences
You might compare a validity of a predicted variant using GENE WEIGHT,
if it is close to the 1st optimal variant, than it worth to consider it.
A simple example of Fgenes-M output:
FGENES-M 1.5.0 Prediction of several variants of multiple genes
Time: 214127.7 Date: 19981003
Seq name: ACU08131
Length of sequence: 5392 GC content: 0.46 Zone: 2
Number of predicted genes: 1 In +chain: 1 In -chain: 0
Number of predicted exons: 6 In +chain: 6 In -chain: 0
Predicted genes and exons in var: 1 Max var= 5 GENE WEIGHT: 24.1
G Str Feature Start End Weight ORF-start ORF-end
1 + TSS 355 7.43 TATA 327 wTATA 21.08 LDF 0.56
1 + 1 CDSf 521 - 641 1.23 521 - 640
1 + 2 CDSi 1066 - 1362 2.08 1068 - 1361
1 + 3 CDSi 1860 - 2028 1.69 1862 - 2026
1 + 4 CDSi 2637 - 2802 2.74 2638 - 2802
1 + 5 CDSi 3558 - 3797 4.35 3558 - 3797
1 + 6 CDSl 4131 - 4247 2.09 4131 - 4244
1 + PolA 4650 3.17
Predicted proteins:
>FGENES 1.5 ACU08131 1 Multiexon gene 521 - 4247 369 a Ch+
MAGTVTEAWDVAVFAARRRNDEDDTTRDSLFTYTNSNNTRGPFEGPNYHIAPRWVYNITS
VWMIFVVIASIFTNGLVLVATAKFKKLRHPLNWILVNLAIADLGETVIASTISVINQISG
YFILGHPMCVLEGYTVSTCGISALWSLAVISWERWVVVCKPFGNVKFDAKLAVAGIVFSW
VWSAVWTAPPVFGWSRYWPHGLKTSCGPDVFSGSDDPGVLSYMIVLMITCCFIPLAVILL
CYLQVWLAIRAVAAQQKESESTQKAEKEVSRMVVVMIIAYCFCWGPYTVFACFAAANPGY
AFHPLAAALPAYFAKSATIYNPIIYVFMNRQFRNCIMQLFGKKVDDGSELSSTSRTEVSS
VSNSSVSPA
FGENES-M 1.5.0 Prediction of several variants of multiple genes
Time: 214127.7 Date: 19981003
Seq name: ACU08131
Length of sequence: 5392 GC content: 0.46 Zone: 2
Number of predicted genes: 1 In +chain: 1 In -chain: 0
Number of predicted exons: 6 In +chain: 6 In -chain: 0
Predicted genes and exons in var: 2 Max var= 5 GENE WEIGHT: 15.1
G Str Feature Start End Weight ORF-start ORF-end
1 + 1 CDSf 218 - 321 1.01 218 - 319
1 + 2 CDSi 984 - 1023 1.94 985 - 1023
1 + 3 CDSi 1860 - 2028 1.49 1860 - 2027
1 + 4 CDSi 2675 - 2802 1.00 2677 - 2802
1 + 5 CDSi 3558 - 3797 4.35 3558 - 3797
1 + 6 CDSl 4131 - 4247 2.09 4131 - 4244
1 + PolA 4650 3.17
Predicted proteins:
>FGENES 1.5 ACU08131 1 Multiexon gene 218 - 4247 265 a Ch+
MRQGGGQITAQLRDKTFKGFEDLVLQVRGLIRLGGNLLVDVCVVIAILVSQLSGPWPLYL
GNAGSLSASPLEMSSSMPNWPWLALSSPGCGLLYGQHHPSLAGVDVFSGSDDPGVLSYMI
VLMITCCFIPLAVILLCYLQVWLAIRAVAAQQKESESTQKAEKEVSRMVVVMIIAYCFCW
GPYTVFACFAAANPGYAFHPLAAALPAYFAKSATIYNPIIYVFMNRQFRNCIMQLFGKKV
DDGSELSSTSRTEVSSVSNSSVSPA
FGENES-M 1.5.0 Prediction of several variants of multiple genes
Time: 214127.7 Date: 19981003
Seq name: ACU08131
Length of sequence: 5392 GC content: 0.46 Zone: 2
Number of predicted genes: 1 In +chain: 1 In -chain: 0
Number of predicted exons: 6 In +chain: 6 In -chain: 0
Predicted genes and exons in var: 3 Max var= 5 GENE WEIGHT: 15.1
G Str Feature Start End Weight ORF-start ORF-end
1 + 1 CDSf 218 - 321 1.01 218 - 319
1 + 2 CDSi 984 - 1023 1.94 985 - 1023
1 + 3 CDSi 1860 - 2028 1.49 1860 - 2027
1 + 4 CDSi 2675 - 2802 1.00 2677 - 2802
1 + 5 CDSi 3558 - 3797 4.35 3558 - 3797
1 + 6 CDSl 4131 - 4247 2.09 4131 - 4244
1 + PolA 4650 3.17
Predicted proteins:
>FGENES 1.5 ACU08131 1 Multiexon gene 218 - 4247 265 a Ch+
MRQGGGQITAQLRDKTFKGFEDLVLQVRGLIRLGGNLLVDVCVVIAILVSQLSGPWPLYL
GNAGSLSASPLEMSSSMPNWPWLALSSPGCGLLYGQHHPSLAGVDVFSGSDDPGVLSYMI
VLMITCCFIPLAVILLCYLQVWLAIRAVAAQQKESESTQKAEKEVSRMVVVMIIAYCFCW
GPYTVFACFAAANPGYAFHPLAAALPAYFAKSATIYNPIIYVFMNRQFRNCIMQLFGKKV
DDGSELSSTSRTEVSSVSNSSVSPA
FGENES-M 1.5.0 Prediction of several variants of multiple genes
Time: 214127.7 Date: 19981003
Seq name: ACU08131
Length of sequence: 5392 GC content: 0.46 Zone: 2
Number of predicted genes: 1 In +chain: 1 In -chain: 0
Number of predicted exons: 6 In +chain: 6 In -chain: 0
Predicted genes and exons in var: 4 Max var= 5 GENE WEIGHT: 15.1
G Str Feature Start End Weight ORF-start ORF-end
1 + 1 CDSf 218 - 321 1.01 218 - 319
1 + 2 CDSi 984 - 1023 1.94 985 - 1023
1 + 3 CDSi 1860 - 2028 1.49 1860 - 2027
1 + 4 CDSi 2675 - 2802 1.00 2677 - 2802
1 + 5 CDSi 3558 - 3797 4.35 3558 - 3797
1 + 6 CDSl 4131 - 4247 2.09 4131 - 4244
1 + PolA 4650 3.17
Predicted proteins:
>FGENES 1.5 ACU08131 1 Multiexon gene 218 - 4247 265 a Ch+
MRQGGGQITAQLRDKTFKGFEDLVLQVRGLIRLGGNLLVDVCVVIAILVSQLSGPWPLYL
GNAGSLSASPLEMSSSMPNWPWLALSSPGCGLLYGQHHPSLAGVDVFSGSDDPGVLSYMI
VLMITCCFIPLAVILLCYLQVWLAIRAVAAQQKESESTQKAEKEVSRMVVVMIIAYCFCW
GPYTVFACFAAANPGYAFHPLAAALPAYFAKSATIYNPIIYVFMNRQFRNCIMQLFGKKV
DDGSELSSTSRTEVSSVSNSSVSPA
FGENES-M 1.5.0 Prediction of several variants of multiple genes
Time: 214127.7 Date: 19981003
Seq name: ACU08131
Length of sequence: 5392 GC content: 0.46 Zone: 2
Number of predicted genes: 1 In +chain: 1 In -chain: 0
Number of predicted exons: 6 In +chain: 6 In -chain: 0
Predicted genes and exons in var: 5 Max var= 5 GENE WEIGHT: 13.9
G Str Feature Start End Weight ORF-start ORF-end
1 + TSS 355 7.43 TATA 327 wTATA 21.08 LDF 0.56
1 + 1 CDSf 521 - 641 1.23 521 - 640
1 + 2 CDSi 1066 - 1362 2.08 1068 - 1361
1 + 3 CDSi 1860 - 2028 1.69 1862 - 2026
1 + 4 CDSi 2637 - 2802 2.74 2638 - 2802
1 + 5 CDSi 3558 - 3668 0.99 3558 - 3668
1 + 6 CDSl 4131 - 4247 2.09 4131 - 4244
1 + PolA 4650 3.17
Predicted proteins:
>FGENES 1.5 ACU08131 1 Multiexon gene 521 - 4247 326 a Ch+
MAGTVTEAWDVAVFAARRRNDEDDTTRDSLFTYTNSNNTRGPFEGPNYHIAPRWVYNITS
VWMIFVVIASIFTNGLVLVATAKFKKLRHPLNWILVNLAIADLGETVIASTISVINQISG
YFILGHPMCVLEGYTVSTCGISALWSLAVISWERWVVVCKPFGNVKFDAKLAVAGIVFSW
VWSAVWTAPPVFGWSRYWPHGLKTSCGPDVFSGSDDPGVLSYMIVLMITCCFIPLAVILL
CYLQVWLAIRAVAAQQKESESTQKAEKEVSRMVVVMIIAYCFCWGPYTFRNCIMQLFGKK
VDDGSELSSTSRTEVSSVSNSSVSPA
--
Victor Solovyev
The Sanger Centre, Hinxton, Cambridge CB10 1SA, UK
Email: solovyev at sanger.ac.ukhttp://genomic.sanger.ac.uk
Phone: 44-1223-494799 FAX: 44-1223-494919