We installed NSITE program to analyze genome Regulatory regions
===================================================================
It is available at http://genomic.sanger.ac.uk/ of our
Computational Genomic Group WEB server
(http://genomic.sanger.ac.uk/gf/gf.html)
NSITE Program Description
NSITE - Search for of consensus patterns with statistical estimation
by Ilham Shahmuradov and Victor Solovyev
Analysis of nucleotide sequences is available through WWW:
http://genomic.sanger.ac.uk/gf/gf.shtml
NSITE serves for analysis of regulatory regions and their functional
motifs composition. The program is designed on UNIX OS and
adopted to work with Transfac type sites.
Method description:
The method is based on statistical estimation of expected number
of a nucleotide consensus pattern in a given sequence [1-2]. It
uses the NSITE formatted datafile, which can include any set
of consensus sequences of functional motifs. In current version this
file consists of the public release of Transfac sequences (3.4, 1998),
composite elements [3] and a set additioanl functional
motifs.
If we found a pattern which has expected number significantly less
than 1, it can be supposed that the analysed sequence
possesses the pattern's function.
In the output of NSITE we can see a pattern, its position in the
sequence, accession number, ID, Description of motif and binding
factor name from the original database if exist.
Asknowledgments: We asknowledge Igor Rogozin which took part in
development some applications of this method for
nucleotide consensuses searching on IBM PC [4].
Output example:
Program *** N S I T E *** Shahmuradov, Solovyev
(http://genomic.sanger.ac.uk)
File with SITEs: nsite.dat
File with SEQUENCEs: ace1.seq
Search PARAMETRS: Expected. Number - 0.0100000
Siginicance Level - 0.9500000 Print Status - Yes
Note: AC - Accession no. in TRANSFAC or NSITE DB
DE - Description (gene or gene product)
RE - Gene region (e.g. promoter,enhancer or unknown)
BF - Binding factor(s)
OS - Organism species
***************************************************************************
> ace-1 /acetylcholinesterase 1 (ACHE)/* Chr. 10*/C.elegans/-2200:-1/
Frequencies: A - 0.31 G - 0.16 T - 0.35 C - 0.18 ... Length =
2140
10 20 30 40 50 60
aaaaaaaactacgtgactagacatatcacgtttcggccgctactactttttgcgttgata
ttttttttgatgcactgatctgtatagtgcaaagccggcgatgatgaaaaacgcaactat
.......................................................
2110 2120 2130 2140
tctcccggcggtccaaacgattatgatttgttgaagaagc
agagggccgccaggtttgctaatactaaacaacttcttcg
===========================================================================
25. [ 3] T: AC: R00037 / DE: beta-actin
RE: unknown / OS: human, Homo sapiens
BF: SRF ..
10
ccttwyatgg
---------- Sites in 2nd chain ----------
Max mismatch : 2
Exp.Number: 0.006 Conf.Interval: 0 Found: 1
begin: 1704 end: 1695 mismatch: 0 exp.num.: 0.006, site:CCTTTTATGG
===========================================================================
74. [ 1] T: AC: R00103 / DE: AMV (avian myeloblastosis virus)
RE: unknown / OS: AMV, avian myeloblastosis
virus
BF: C/EBPalpha ..
10
cttgcgtca
---------- Sites in 1st chain ----------
Max mismatch : 0
Exp.Number: 0.004 Conf.Interval: 0 Found: 1
begin: 1920 end: 1928 mismatch: 0 exp.num.: 0.004, site:CTTGCGTCA
===========================================================================
103. [ 1] T: AC: R00140 / DE: apoAII (apolipoprotein AII)
RE: unknown / OS: human, Homo sapiens
BF: Tf-LF1 .. NF-BA1 ..
10 20
cttcaacctttaccctggt
---------- Sites in 2nd chain ----------
Max mismatch : 4
Exp.Number: 0.001 Conf.Interval: 0 Found: 1
begin: 897 end: 879 mismatch: 4 exp.num.: 0.001,
site:CTTCAACgTTgtCCCTGaT
===========================================================================
287. [ 1] T: AC: R00381 / DE: EGF receptor
RE: unknown / OS: human, Homo sapiens
BF: Sp1 ..
10 20
tccgccccccgcacgg
---------- Sites in 1st chain ----------
Max mismatch : 4
Exp.Number: 0.004 Conf.Interval: 0 Found: 1
begin: 79 end: 94 mismatch: 4 exp.num.: 0.004,
site:TCCGtCCCCgcCACtG
===========================================================================
Reference:
[1] Shahmuridov K.A. Kolchanov N.A.Solovyev V.V.Ratner V.A. Enhancer-like
structures in middle repetitive sequences of the
eukaryotic genomes. Genetics (Russ),22, 357-368,(1986).
[2] Solovyev V.V., Kolchanov N.A. 1994,
Search for functional sites using consensus
In Computer analysis of Genetic macromolecules. (eds. Kolchanov N.A.,
Lim
H.A.), World Scientific, p.16-21.
[3] Heinemeyer, T., Chen, X., Karas, H., Kel, A. E., Kel, O. V., Liebich,
I.,
Meinhardt, T., Reuter, I., Schacherer, F., Wingender, E. (1999).
Expanding
the TRANSFAC database towards an expert system of regulatory olecular
[4] Solovyev V.V.,Rogozin I.B. The program package of the context analysis
of DNA, RNA and protein sequenses 1.Search for gomology
and functional sites. Institute Cytology and Genetics of the
USSR Academy of Science, Novosibirsk,(Russ),1-70,(1986).
--
Victor Solovyev
The Sanger Centre, Hinxton, Cambridge CB10 1SA, UK
Email: solovyev at sanger.ac.ukhttp://genomic.sanger.ac.uk
Phone: 44-1223-494799 FAX: 44-1223-494919