David Huen asked about C. elegans codon preference tables.
Here are the tables used by Phil Green's program GENEFINDER for
codon preference, as used by the genomic sequencing project. They
were assembled by LaDeana Hillier and Phil Green at St Louis. The
original tables used by Genefinder are available in the subdirectory
wgf/ of the ACEDB release.
Table of codon frequences in a number of confirmed coding genes:
A C G T
A 594 292 478 81 A
827 581 190 810 C
1495 169 57 720 G
618 489 163 658 T
C 1098 1008 169 73 A
332 77 328 721 C
490 171 71 198 G
317 143 631 880 T
G 1200 371 1465 144 A
705 822 107 600 C
1321 134 55 254 G
1067 988 291 687 T
T 0 352 0 86 A
465 433 310 726 C
0 266 217 504 G
296 499 273 277 T
Followed by a table of triplet frequencies as they would occur in
DNA with standard C. elegans diunucleotide frequencies:
A C G T
A 4635 1446 1601 1730 A
1439 617 753 1425 C
1429 719 626 1432 G
3129 1097 1097 3129 T
C 1938 1050 995 733 A
887 375 456 993 C
922 458 458 922 G
1432 626 719 1429 T
G 2307 879 1059 817 A
736 458 458 736 C
993 456 375 887 G
1425 753 617 1439 T
T 1752 1804 1804 1752 A
817 1059 879 2307 C
733 995 1050 1938 G
1730 1601 1446 4635 T
Genefinder uses the log of the ratio of the values in these two tables
for its per-triplet scores. Our experience is that C. elegans genes
vary a lot in degree of bias, with many of them showing little or no
bias. In general it appears that the more biased genes are likely to
be more highly expressed. There was a Gazette article a couple of
years ago from Chris Fields about this.
Richard Durbin