GENES in Human genome draft:
(49171 Genes and 282378 exons)
We presented for free public USAGE Human genes predicting genes by one of the most accurate
FGENESH program in a draft of HUMAN genome assembled by UCSC Human Genome Project Team
Thanks to Domenick Venezia who pointed us to a bug in file hdg.exo, which
resulted in missing predicted exons for Chromosome 21.
The missing exons from chromosome 21 were added on Dec. 13, 2000.
If you downloaded hgd.exo file prior to that date, please download the fixed verion:
at http://www.softberry.com/inf/humd_an.html
(That did not affect exon amino acid sequences in file hgd.exp below).
The complete results of this analysis are presented in Table 1 and can be seen
in the InfoGene database at:
http://www.softberry.com/inf/infodb.html
where the Infogen Java viewer can by used to visualize the predictions along
the chromosomes and by Action meny and Obtain Locus to get Prediction data
The sequences of exons and gene annotation data can be copied
for using them locally or to create microarray oligos:
>Human genome predicted genes/exons
>Predicted amino acid sequences of exons with PfamA annotation
Table 1. Summary of predicted genes and proteins in Human genome sequences
GENES EXONS BASES MASKED+N %N %N+M GENE_PER EXON_PER
Total: 49171 282378 3374262130 1755813225 19 52 68623 11949
Predicted Genes annotated using Pfam similarity search
Later we plan to annotate also CELL LOCATION of predicted proteins
Total number of different types pfamA domains - 1154
(the same domains in neighbor exons counted here one time)
467 pkinase Eukaryotic protein kinase domain
372 7tm_1 7 transmembrane receptor (rhodopsin family)
308 Myc_N_term Myc amino-terminal region
256 Topoisomerase_I Eukaryotic DNA topoisomerase I
224 ig Immunoglobulin domain
183 rrm RNA recognition motif. (a.k.a. RRM, RBD, or RNP domain)
182 PH PH domain
180 Myosin_tail Myosin tail
166 EGF EGF-like domain
159 filament Intermediate filament proteins
154 Syndecan Syndecan domain
143 ras Ras family
138 RNA_pol_A2 RNA polymerase A/beta'/A" subunit
123 BTB BTB/POZ domain
and etc...
---