*************************************************************************
HBR - Recognition of Human and E.coli sequences to test
a library for E. coli contamination
Department of Cell Biology, Baylor College of Medicine
=========================================================================
Analysis of sequences is available through
Weizmann Institute of Science server
with the name of the program in the subject line
Example:
mail -s hbr services at bioinformatics.weizmann.ac.il < test.set
Soon HBR will be installed in the University of Houston server:
mail -s hbr service at theory.bchs.uh.edu < test.set
where test.set a file with one or some sequences.
And you can run the program by
WWW BCM Human Genome Center and Search launcher Home page
URL:http://kiwi.imgen.bcm.tmc.edu:8088/search-launcher/launcher.html
for accsess to Gene-finder prediction Help files and programs.
-> BCM Gene Finder
Description:
**********************
Recognition of human and bacterial sequences (HBR) to test
a library for E. coli contamination by sequencing example
clones. The program calculates the probability to be a human
sequence (P) or E.coli sequence (1-P) for each sequence of your
set and the total percentage human and bacterial sequences in
the set.
The method is based on linear discriminant functions
Solovyev V.V.,Salamov A.A., Lawrence C.B.
Predicting internal exons by oligonucleotide composition and
discriminant analysis of spliceable open reading frames.
(Nucl.Acids Res.,1994, 22,24, 5156-5163).
Accuracy:
********************************
The accuracy of recognition is about 99%. But you have better to present
long sequences and enough representative set of them.
We recommend to analyse 400 bp and longer sequences and do not
take into account the sequences with 0.4 < P < 0.6 which can not be
reliable assigned to human or E.coli group.
Submitting sequences via email:
***********************************
For email submission the sequences must have the following format:
Name of 1st sequence
ccatctctgtcttgcaggacaatgccgtcttctgtctcgtggggcatcctcctgctggca
ggcctgtgctgcctggtccctgtctccctggctgaggatccccagggagatgctgcccag
aagacagatacatcccaccatgatcaggatcacccaaccttcaacaagatcacccccaac
ctggctgagttcgccttcagcctataccgccagctggcacaccagtccaacagcaccaat
Name of 2nd sequence
ccatctctgtcttgcaggacaatgccgtcttctgtctcgtggggcatcctcctgctggca
ggcctgtgctgcctggtccctgtctccctggctgaggatccccagggagatgctgcccag
atcttcttctccccagtgagcatcg...............
......
(Restrict the line length to less than 80 characters;
The line with the sequence name must have at least one 'Space' symbol
in the first position).
HBR output:
******************
1st line - total number and % of human and E.coli sequences
the next groups of 3 lines:
1st line - numbers of your sequences
2nd line - length of them
3d line - Probability to be Human sequence (P) or bacterial (1-P)
For example:
Number of sequences- 12 % human= 50 % bacterial= 50
1 2 3 4 5 6 7 8 9 10
900 960 1501 360 360 1020 330 480 240 541
1.00 1.00 0.83 1.00 1.00 0.00 0.01 0.01 0.00 0.01
11 12
720 540
0.57 0.01
Problems, comments, and suggestion:
can be mailed to solovyev at cmb.bcm.tmc.edu.