In article <473sgh$ipo at itssrv1.ucsf.edu>, bgold at itsa.ucsf.edu (Bert Gold)
wrote:
> Leroy Hood gave a talk here today and stated that the largest
> human conting., by a factor of 10, is the Human Beta T-Cell Receptor
> locus he has been studying: It is 685 kB, with 48 out of 65 apparently
> functional "genes", the remainder being pseudogenes. Since Alu
> sequences are present on average 1/5000 bp, you should be able to
> find around 135 Alu repeats in his sequence. Hood is studying the
> ways in which this contig. provides analytical information concerning
> gene function; he is apparently trying to develop theories of how genes
> might be demarcated using bioinformatic algorithims, rather than
> exon trapping methods.
>
Lee is exagerating a bit--it's more like a factor of 3. There are six
files in GenBank with human sequences greater than 100 kb. These are
found in a table (pg 122) in the October 1995 issue of Nature Genetics in
an excellent article by Richard Gibbs on the prospects for major human DNA
sequencing (Gibbs 1995. Nat Genet 11:121-125.).
Size (kb) Gene Acc #
685 T-cell receptor beta locus L36092
180 Retinoblastoma locus L11910
152 fmr1 locus L29074
152 Breakpoint-cluster region (BCR) U07000
130 IDS gene L43581
101 Neurofibromatosis type-1 locus L05367
There is some sequence from the Sanger Centre of chromosome 4p that may be
long as well, but it is deposited as individual cosmids.
These can also be found among those listed by Keith Robison
(robison at nucleus.harvard.edu) on his web site "The 100 kb Club":
http://golgi.harvard.edu/100kb/
Hope to see more soon, long range human sequence is still ~0.1% of the
total (3.5 Mbp / 3 Gbp).