IUBio GIL .. BIOSCI/Bionet News .. Biosequences .. Software .. FTP

Annotated 20000 Promoters in Human Genome in Genome Explorer

webmaster webmaster at softberry.com
Tue Jul 17 13:28:43 EST 2001

Promoters predicted in December draft of human genome are presented in 
Genome Explorer along with known and predicted genes:


Right mouse click on a promoter in Genome Explorer reveals promoter 
sequence, presented in two blocks for TATA+ promoters and in one block for 
TATA-less promoters. First block of TATA+ promoter is TATA-box, and the 
second is a stretch from predicted transcription start site (TSS) to known 5'-end 
of mRNA or translation start site. 

 Promoters were predicted by Softberry promoter prediction program TSSW in 
regions up to 3000 from known starts of coding regions (ATG codon) or known 
mapped 5'-mRNA ends. We found that limiting promoter search to  such regions 
drastically reduces false positive predictions. Also, we have very strong 
thresholds for prediction of TATA-less promoters to minimize false positive 

 Our promoter prediction software accurately predicts about 50% promoters 
accurately with a small average deviation from true start site. Such accuracy 
makes possible experimental work with found promoter candidates.

For 20 experimentally verified promoters on Chromosome 22, TSSW predicted 
15, placed 12 of them  within (-150,+150) region from true TSS and 6 (30% of 
all promoters) - within -8,+2 region from true TSS.
These results are significantly better than those obtained with PromoterInspector 
program (Scherf M., Klingenhoff A., Fresch K. et al. (2001) First Pass 
Annotation of promoters of  human chromosome 22. Genome Res., 11,333-
340), where only 50% promoters from the same sample were found, with 
deviations from true TSS ranging from 200 to 1000 bp. 

We predicted 17632 TATA+ promoter and 2383 TATA-less promoters overall 
in human genome draft. For Chromosome 22, we predicted 350 TATA+ 
promoters and 85 TATA-less promoters.

New Fgenesh++ gene predictions for December draft of human genome are 
presented by Softberry Inc. (www.softberry.com) at 
http://genome.cse.ucsc.edu/goldenPath/decTracks.html and will be presented in 
Softberry Genome Explored with some  expression data soon 

 44409 genes include 5883 genes correponding to refseq mRNA, 3592 genes 
corresponding to GenBank mRNAs, 2047 known genes and 302 pseudogenes.

Methods of predictions are described at:
Solovyev V.V. (2001) Statistical approaches in Eukaryotic gene prediction. In: 
Handbook of Statistical genetics (eds. Balding D. et al.),  John Wiley & Sons, 
Ltd., p. 83-127.)


More information about the Bio-www mailing list

Send comments to us at archive@iubioarchive.bio.net