What Genome have been Sequenced?

Zharkikh Andrey gsbs1022 at UTSPH.SPH.UTH.TMC.EDU
Tue Oct 13 08:27:01 EST 1992

robison1 at husc10.harvard.edu (Keith Robison)

>>As Monte Carlo simulation shows, about 24 of these ORF are expected
>>to be found by chance (in a random sequence of length 300,000 with
>>the same base frequencies as in yeast III chromosome: A=0.31, T=0.30,
>>G=0.19, and C=0.20).
>>Andrey.

>	Curiosity: in the ChrIII paper, the claim was made that
>ORFs of >100 amino acids "have 0.2% probability of occurring by
>chance in S.cerevisiae DNA." Is this consisistent with the above
>estimate?

>Reference ginven (I haven't looked it up yet)
>	Sharp & Crowe.  Yeast (1991) 7:657-678.

>Keith Robison

I didn't read the paper, but the Monte Carlo estimate
can be simply supported by the following consideration:

The average ORF length is 64/3 = 21.3  (for equal base contens)

The expected number of ORFs (of any size including zero length)
is 100,000/21.3 = 4687.5

The probability of ORF of size L is   p(1-p)^L   where p=3/64

The probability of ORF of size L>=100 is (1-p)^100 = 0.008222163

The expected number of long ORFs (L>=100) is 4687.5*0.008222163 = 38.54
Considering the complementary chain doubles this amount, 77.

This is quite close to the Monte Carlo estimation, 24*2 = 48
The difference might be due to unequal usage of bases, boundary effects,
including/excluding stop codon to ORF etc.

So, the probability of finding this long ORFs is not too small
as it could seem at the first glance.

Andrey