Announcement:
The software package for performing "Bootscanning" is now available from
our website at
http://hivgenome.hjf.org/
"Bootscanning" is a method for detecting recombination in viral
sequences. We have only
applied it to HIV-1, but it should work for any genes that contain
sufficient phylo-
genetic signal.
You will need GDE and Phylip to run the package, and at the current
time, we only provide
SUN executables. However, the source-code is also included, so anyone
who would care to
compile on another system capable of running GDE and Phylip is free to
do so.
Below is an excerpt from some of the documentation:
Bootscanning Package v. 1.0beta1
Principal Idea and Approach: Mika Salminen, msalminen at hiv.hjf.org
Design and programming: Wayne Cobb, wcobb at reed.hjf.org
(c) 1994-1996: Mika Salminen, Wayne Cobb, Henry M. Jackson Foundation
Description:
Bootscanning is a method for anaysis of viral recombination. It can be
used to
compare an unknown, suspected recombinant sequence, to a set of
predefined
potential parental sequences. It should be independent of organism, but
we have
only used it for HIV-1 and suspect that it only works for sufficiently
variable
genes.
In the case of HIV-1 there are predefined genetic subtypes which have
been
called A-H in analyses based on the envelope and gag genes, with the
notable
exception that the E-subtype does not exist in gag. Viruses with
E-subtype
envelopes group with the A-type viruses in gag.
Bootscanning relies on the alignment of a suspected recombinant sequence
with
a set of potential parental reference sequences (groups of sequences
from the
differnt subtypes, or consensus sequences created from sets of reference
sequences). After optimal alignment, the alignment is broken into
sequential,
overlapping segments (or windows) which are fed to a program for
phylogenetic
analysis (any of the sequence programs of Phylip could be used, we have
included
menus for three methods). Bootstrapped phylogenetic trees are built for
each
segment and finally the bootstrap value for placing the unknown with
each of
the reference sequences/sequence groups is tabulated and plotted along
the genome.
The assumption is that the unknown will always reach high bootstrap
values with
it's parental subtypes in windows covering areas of that subtype in the
unknown.
When a recombination breakpoint is reached, the bootsrap value for one
parent+the unknown will go down, and the bootsrap value for the second
parent+
the unknown should go up.
Therefore we should find the breakpoint in the intersection of the
plotted
bootscanning lines.
Reference alignments of non-recombinant gag and env genes and
subtype-reference
sequences are included in the package. However, be aware that the B and
D-subtypes are sometimes difficult to separate in gag and that the
A-subtype in
some regions separates to at least 2 subclusters. Therefore we have
included
consensus sequences A1 and A2 in the cocnsensus alignments.
The package contains the following components:
Menufile for GDE:
"GDEmenus.Bootscan"
Insert the menus in this file into your ".GDEmenus"-file.
Shell-scripts (place all in /GDE/bin/):
"mlbootscan.sh"
Performs Maximum Likelihood Bootscanning analysis using SEQBOOT,
DNADIST, FITCH and CONSENSE
"njbootscan.sh"
Performs Neighbor Joining Bootscanning analysis using SEQBOOT, DNADIST,
NEIGHBOR and CONSENSE
"parsboot.sh"
Performs Neighbor Joining Bootscanning analysis using SEQBOOT, DBNAPARS
and CONSENSE
"analyze.sh"
Shell script to run the program analyze.
"genchop.sh"
Shell script to run the program chop
Executables (place all in /GDE/bin/):
readseq Fixed sequence format converter.
"PhyloMask" Creates masks to exclude gaps (menu available in
GDEmenus.Bootscan).
"chop" Breaks up masked alignment into individual segments which are
created as sequentially numbered input files to the Phylip
programs in a subdirectory of your home directory.
Each window is numbered at the midpoint of the segment in the
alignment. For each segmnet a corresponding outfile and treefile
is created which will contain the bootstrap values and the
consensus trees. The programs 'analyze' and 'report' are
used to extract and tabulate the bootstrap values of specified
groups of sequences (taxa or clades). Plotting the tables
using the alignment as the x-axis and the bootrap value as the
y-axis can be used to identify recombination points.
"analyze" Extracts bootstrap values from outfiles. Use analyze.sh to
run
"report" Collects bootstrap values in tab-delimited table.
Attached to this file is also an example of a bootscan-plot of an
A/D-recombinant virus.
We hope that you will be able to use the method to produce some useful
data,
and would certainly be happy to hear comments and critique about the
package
and certainly also bug-reports. We would especially be delighted to here
from
anyone who has managed to successfully install the package.
We also acknowledge that the package is crude and simple, we have not
put
a lot of effort into getting it very elegant, but it works for us, and
we hope
that it will for other people, too. We have tried to remove any bugs
that
have crept up during development, but there will certainly be more.
Anyone using
the package will do it on their own risk, we take no responsibility for
any
loss of data or hardware that may result from the use of the package
(hopefully
there will be none!).
Finally, some acknowledgements and disclaimers:
This work was supported in part by cooperative agreement N.
DAMD17-93-V-3004,
between the United States Army Medical Research and Materiel Command and
the
Henry M. jackson Foundation for the Advancement of Military Medicine,
and by
grant No. 19191 from the Finnish Academy of Science (Mika Salminen). The
views and opinions expressed herein are those of the authors and do not
purport to reflect the official policy or position of the US Army or of
the Department of Defense.
Please refer to the following publications if using the package:
Mika O. Salminen, Jean K. Carr, Donald S. Burke and Francine E.
McCutchan.
(1995). Identification of Breakpoints in Intergenotypic Recombinants of
HIV-1 by Bootscanning. AIDS Research and Human Retroviruses, 11,
1423-25.
Smith S, Overbeek R, Woese CR, Gilbert W and Gillevet P. (1994) The
Genetic Data Environment: An expandable GUI for Multiple Sequence
Analysis. Comp Appl Biol Sci, 10, 671-675.
Felsenstein, J. (1991). "Phylip Manual", v. 3.4, University Herbarium,
University of California, Berkeley, California.
[Part 2, Text 738 lines]
[Unable to print this part]