DNA Slider, version 1.11 January 5, 1998 DNA Slider is a Macintosh program for performing a significance test of heterogeneity in the ratio of polymorphic sites to fixed differences in DNA sequence data. Help screens make the program fairly self-explanatory. It's basically the same idea as the test described in: McDonald, J.H. 1996. Detecting non-neutral heterogeneity across a region of DNA sequence in the ratio of polymorphism to divergence. Molecular Biology and Evolution 13:253-260, except that several new statistical tests have been added which are often more powerful than the runs test. These new tests are described in McDonald, J.H. 1998. Improved tests for heterogeneity across a region of DNA sequence in the ratio of polymorphism to divergence. Molecular Biology and Evolution, in press. DNA Slider should run on any Macintosh. It runs fastest on a Mac with a PowerPC chip, and on older Macs it takes advantage of an FPU if one is present. I have no plans to port the program to any other platform. If you have data you want to analyze and don't have access to a Macintosh, let me know and I'll analyze your data (in return for effusive thanks in the acknowledgements, of course). I wrote the program using the CodeWarrior Pascal compiler, and you can look at the source code (http://udel.edu/~mcdonald/slidersource.html) if you want. To use the program, you must first align multiple sequences from one species and one sequence from a second species. Ignore any areas aligned with gaps, where part of the sequence is not present in all individuals. Also ignore any amino acid replacement (non-synonymous) variation. Next, classify those sites which differ from the consensus sequence as polymorphisms or fixed differences, as illustrated below. Where a single nucleotide site has three or four bases, they are treated as if they were at adjacent sites. For example, a site which has C, T and G in species A would be counted as two polymorphisms. Where a single nucleotide site has both a polymorphism and a fixed difference, they are treated as adjacent sites, and to be conservative they are put in the order that yields less extreme values of the test statistics. Here are some example sequences, six from species A and one from species B, of a region 21 nucleotides long. They will look best in a fixed-width font such as Courier or Monaco. Dashes indicate identity to the first sequence, dots indicate the insertion of a gap in the alignment. site sp. A sp. B 1 T----- - 2 G----- C fixed difference #1 3 A----- - 4 A--CC- - polymorphism #1 5 G----- - 6 C----- . 7 A-T--- . gap, so ignore this polymorphism 8 T----- . 9 G----- - 10 C-G--- - polymorphism #2 11 C----- - 12 C----T G fixed difference #2 and polymorphism #3 13 A----- - 14 G----- A fixed difference #3 15 T----- - 16 C--.-- - 17 A--.-- T gap, so ignore this fixed difference 18 G--.-- - 19 C----- - 20 CT---G - polymorphisms #4 and #5 21 T----- - The input data set for the program would have the name of the gene (or other identifying information) on the first line, the number of variable sites on the second line, then one line for each variable site, giving the site number followed by a "p" or "f" for polymorphism or fixed difference. The file based on the above data would look like this: nothing dehydrogenase 8 2 f 4 p 10 p 12 f 12 p 14 f 20 p 20 p Because site 12 has both a polymorphism and a fixed difference, there are two possible orders. The order shown maximizes the number of runs, and thus is conservative for the runs test. It may also be conservative for the other tests, but both orders should be tried and the one yielding the less extreme values of the test statistics should be used. When running the program, different values of the recombination parameter (R=4Nr) should be tried, and to be conservative, the value giving the greatest probability of obtaining the observed or fewer number of runs by chance should be used. Because this probability varies little over a fairly broad range of R, it is unnecessary to try a large number of closely spaced R values. Setting R=2, 4, 8, 16, 32... seems to work well. Note that with higher R values, the time required for the simulations increases dramatically. ========================================================= John H. McDonald Department of Biological Sciences University of Delaware Newark, Delaware 19716 mcdonald@udel.edu