Hello. A while back someone asked about how to interpret results of sequence
database searches and was pointed to a recent article by the staff at NCBI
(Nature Genetics v6 p119). I read this article and it has helped a little.
Would someone care to explain to a biologist how to interpret these numbers!
I have found some interesting similarities. but would like to know just how
interesting they are, in terms of statistical significance. I am not a
mathematician and do not have the inclination to review the entire literature
on this subject. As an example, how do I read this:
Score = 59 Length = 36 Expect = 1.4e+01 Sum-Stat P(2) = 2.0e-27
?? I have read on how scores are summed. Length is intuitive. How about
"expect" and the P value????
Other examples include alignments with P values much closer to 1. Does this
mean they should be considered irrelevant? help! As an aside, I am a graduate
student and am amazed at how often articles show some alignment and no
statistical argument. My fellow students see this alignment and take it as
proof that the sequences must be related. In other words, we need to be
educated as to how these results can be interpreted! Like I said, if this
can be explained without delving into the minutae of the algorithms, I think
it will help a lot of people to utilize these methods to their fullest,
without overinterpretting the results. Thanks in advance...
brett at borcim.wustl.edu