IUBio GIL .. BIOSCI/Bionet News .. Biosequences .. Software .. FTP

Protein structure from scratch?

I. Badcoe mbbad at s-crim1.dl.ac.uk
Wed Aug 17 05:10:01 EST 1994


Douglas C Pearson (dopearso at magnus.acs.ohio-state.edu) wrote:

: to steer this thread in a new direction, what would you guys say the most
: important details in building a protein structure based on homology are?
: mr (dr?) van de loo mentioned mutagenesis and photolabeling as well.  how much
: more reliable is that kind of data compared to homology data?  are there any
: other things that somebody wanting to do this kind of thing should look out
: for?

If you have a sufficiently closely related protein with a *KNOWN*STRUCTURE*
then the problem gets quite easy.  You decide which parts of the two sequences
are equivalent (by sequence alignment, knowledge of the active site and
knowledge of which parts are more likely to vary (surface in general and turns
in particular)) this gives you three classes of sequence variation:

i) mutation of a residue in the known structure into a different one,

ii) residues present in the known structure but absent in the unknown 
    (deletions), and,

iii) residues present in the unknown structure but absent in the known
     (insertions).

The first you handle by simply substituting the new residue for the old and
then 'jiggling' it around to give a convincing structure, obviously this is far
more relyable if (1) few of its neighbours are also rearranged, (2) few of its
neighbours have to be moved to accommodate it and (3) its very similar to the
original residue.

The second is no problem *PROVIDED* the residues at the ends of the deleted
section are close enough together to permit their connection.  If this is not
the case then it indicates either (1) a larger change in the structure, or, (2)
a failure in your attempt to decide which parts of the sequences are equivalent.

The third is a similar case to the second except that you don't have any
information about the inserted segment.  This may well be only a minor problem
since the insertion is usually on the surface of the protein and you may safely
accept that you know little about its conformation without invalidating the
rest of the model.

OK, so homology to a known structure is the most useful kind of information.

Other kinds of information are less informative, mutagenic and photolabelling
can be classed together with a whole load of other forms of 'indirect'
structural information.  Their value depends entirely on the particular case
involved.  For example, identification of active-site residues (say, by
mutagenesis) tells you that those residues are all (fairly) closely situated
(depending on the exact substrate) and if you have a *REALLY*GOOD* secondary
structure prediction, that localisation my be all you need to tie to the
sequence down to a particular arrangement.  On the other hand, without the
secondary structure, the active site information would be of little use.

All the kinds of 'indirect' information have this nature, they're insufficient
to tell you the structure but they can easily provide that vital extra clue just
when you need it.

The only other approach (which nobody has mentioned yet) is that of the analysis
of multiple sequence alignments.  This has been attempted by several people but
the only one I can speak about is the one used by Prof Benner ('cos I worked on
it for a while; mail me and I'll tell you the references) which requires (1) as
many different, related sequences as possible, preferably spread over quite a
width of PAM values (PAM is the measure of how closely related two sequences
are) and with a fairly even spread between (i.e. it would be best to have PAM
distances like {1,2,5,10,20,30,40,50,60,70 ... 150} from your sequence of
interest), and (2) a *REALLY*GOOD* multiple sequence alignment (and it's
important to note that multiple sequence alignment is an imcompletely solved
problem, Benner and Gonnet had to develop their own).

What you then do is to use the patterns of mutation and conservation exhibited
by each residue in the alingment to assign likely properties to each residue,
thus:

WHOLLY CONSERVED -> (implies) active site

HYDROPHILLIC VARIATION -> surface

HYDROPHOBIC CONSERVATION -> interior

HYDROPHOBIC VARIATION -> (strongly implies) interior (because this is a pointi,
			 which, even though it has changed, has still been
			 required to remain hydrophobic)

and so one, with quite a lot of (quite) complicated stuff that can only really
be useful if you've got a good grip one the nature of evolution (+ a good
understanding of the family of proteins involved is a real help.  For example
with alcohol dehydrogenases, Prof. Benner knew that the eukaryotic versions
were wide-specificity detoxification enzymes, whereas the fungal versions were
all ethanol-dehydrogenases.  Thus the sequences positions that are conserved in
the latter but which vary in the former *MUST* be the substrate recognition
site).

Anyway, enough blithering on . . .

To summarise:

Homology to a known structure is best.

Knowledge of a related family of proteins is useful.

All other kinds of knowledge are potentially useful.

Badders





More information about the Molmodel mailing list

Send comments to us at archive@iubioarchive.bio.net