Thu Mar 17 17:47:56 EST 1994

Hello again, I wrote this letter earlier in the day, but never saw it posted,
so here goes again. Thanks to all the people who responded to my request on
how to interpret the results of sequence database searches. It seems there
needs to be more education of us "wet" biologists as to how these things work.
Now, my next question. I would like to generate a consensus sequence from my
favorite virus in order to use as a query sequence that might be more useful
than any single sequence. I use Macs and VMS. I would like to use GCG to align several protein sequences (>100) in
order to create a consensus sequence. The problem is that a lot of the
sequences are only partial. I tried to use PILEUP, but it did not handle
the sequences with internal overlap well. ie:


 The result was a blank outfile. However, when I used as input the full-length
sequences, I got a nice alignment back. So, I have been using this alignment as
a backbone to align sequences using LINEUP. The Zip routine seems to be able
to correctly place these internal sequences. However, LINEUP can only handle
30 sequences. I have been considering making several LINEUP alignments and then
aligning the consensuses I get from them. Is this a reasonable way to go? I am
afraid of misrepresenting some columns with this approach.
  SO, does anyone have any suggestions as to how best to line up all the seqs
I got? Other programs you would recommend for me?

brett at borcim.wustl.edu

