NAME RepeatFinder - search for statistically significant, high-complexity aminoacid repeats. SYNOPSIS RepeatFinder -f filename [-b] [-d] [-e] [-s] [-g] [-j] [-r] [-l] [-t] DESCRIPTION RepeatFinder is an ancillary program designed to automatize the use the Gibbs sampler program (Neuwald et al., 1995) for the detection of statistically significant, ungapped aminoacid repeats. Input sequences need to be in fasta format. Detected repeats are stored in the file 'result'. RepeatFinder runs the Gibbs sampler program -g times searching for repeated modules ranging in size between -b and -e aminoacids, using an increment step of -s aminoacids. Only modules with a significance below -t threshold are considered as significant. p-value of the alignment is calculated according to the Karlin-Altschul statistic (Karlin and Altschul, 1993). Sequences need to be of length at least -l aminoacids in order to be analyzed. The option -j allows to just count the input sequences. The option -r can be used to restart the program after a computer crash without losing what has been already calculated. Parameters to be passed to the Gibbs sampler program are enclosed in the file 'gibbspar'. OPTIONS -b smallest size of the repeated modules (default: 5) -e biggest size of the repeated modules (default: 30) -s increment step between -b and -e parameters (default: 5) -g # runs of the Gibbs sampler (default: 50) -r restart from a previous crash (default: off) -l minimum sequence length (default: 100) -t significance threshold (p-value) (default: 0.001) -d max number of diagonals in the block (default: 50) EXAMPLE The following example: RepeatFinder -f prion.tfa -b 10 -e 20 -t 0.0005 will prompt RepeatFinder to run the Gibbs sampler to detect repeats (if any) in the sequence 'prion.tfa'. It will search for repeats ranging between 10 and 20 aminoacids and will report only those with a significance (p-value) below 0.0005. REFERENCES Karlin S, Altschul SF. Applications and statistics for multiple high-scoring segments in molecular sequences. Proc Natl Acad Sci U S A 1993 Jun 15;90(12):5873-7 Neuwald AF, Liu JS, Lawrence CE. Gibbs motif sampling: detection of bacterial outer membrane protein repeats. Protein Sci 1995 Aug;4(8): 1618-32. Lavorgna G, Patthy L, Boncinelli E. Were protein internal repeats formed by 'bricolage'? Trends Genet. 17, 120-123 (2001). README Uncompress RepeatFinder2_02.tar.Z by typing "uncompress RepeatFinder2_02.tar.Z". Then, extract the files by typing "tar -xvf RepeatFinder2_02.tar". This will create a directory "RepeatFinder2_02", which contains three subdirectories named "code", "doc" and "data". To compile switch to the 'code' directory and type "./compile" on the command line. (You may need to reset the CC macro in code/makefile to correspond to your compiler; the default is CC = cc). This will create 6 executable files in the current directory: xnu, gibbs, RepeatFinder, prosperolauncher, panalyzer and rfanalyzer. The gibbs program corresponds to a slighlty modified (i.e., less verbose) version of the gibbs sampler program by Andy Neuwald et al. The xnu software by Claverie and Slates was used to filter out low-complexity repeats from the protein sequence sets, when using the prospero program. RepeatFinder and prosperolauncher are two ancillary programs which use, respectively, the gibbs sampler and the prospero programs, in order to possibly detect statistically significant repeats from a sequence set of proteis in fasta format. rfanalyzer and panalyzer programs are used for analyzing the output generated, respectively, by the programs RepeatFinder and Prospero. Please refer to the doc section for details on program usage. References Neuwald A.F., Liu J.S., Lawrence C.E. (1995) Gibbs motif sampling of bacterial outer membrane protein repeats. Protein Sci 4, 1618-32. (gibbs sampler) Lavorgna G., Patthy, L., Boncinelli E. (2001) Were protein internal repeats formed by 'bricolage'? Trends Genet 17, 120-3. (RepeatFinder) Claverie, J.M. & States, D.J. (1993). Information enhancement methods for large scale sequence analysis. Computers and Chemistry 17, 191-201. (xnu program) Mott, R. 2000 Accurate Formula for P-values of gapped local sequence and profile alignments. J. Mol. Biol 300:649-659. (Prospero program)