Dimitri Harmegnies wrote:
>> Does anybody could explain me the principle of homology modelling (for
> example, wich percent of sequence homology is considered as sufficient
> to suppose a structural homology ?)
Comparative (or homology) protein structure modeling uses experimentally
determined structures (templates) to predict the conformation of another
protein (target). This is possible because proteins with similar
sequences have similar structures.
The methods consist basically of four steps:
1. Identify the proteins with known structure that are related to the
target sequence. Genrally you use sequence alignment for this but you
could use threading or profile methods too.
2. Align the templates with the target sequence. This step is critical
if you don't have a good alignment the model will contain many errors.
3. Build the model for the target sequence based on the alignment with
the templates. This is the step where the different modeling methods
differ. The particular method you choose is NOT the most important
factor in the final quality of your model (template selection and
alignment are much more important).
4. Evaluate the model and if necessary correct the alignment and repeat
the procedure until no further improvement is possible. The evaluation
criteria can be many. It's usually good to make stereochemical checks
(for example with program PROCHECK) and also use some sequence-structure
matching programs (like ProsaII) which can be very useful to detect
errors in the model.
With respect to your second question. In general one looks at the
sequence identity (percent IDENTICAL residues) between the target and
the template to estimate how reliable the final model will be. There are
two components to this. First, proteins with lower sequence identity are
less similar even if they have the same fold. that means that even if
you have a perfect template-target alignment (step 2) a model with a 70%
sequence identity template will be better than a model with a 40%
sequence identity template. Check the paper by Chothia & Lesk in EMBO
Journal 5, 823-826 (1986) "The relation between the divergence of
sequence and structure in proteins".
The second factor is that the lower the sequence identity the more
likely it is you will have errors in the alignment.
In practice if you have 40% sequence identity or higher you'll get a
pretty good model and you can use automated methods to do that. You can
use our program MODELLER
which can take the alignment and the templates and automatically produce
the homology model in a short time without manual intervention.
If you find between 30% and 40% sequence identity the proteins have the
same overall fold and you could build a decent model if you are very
careful with the alignment.
Even more you could detect structural similarity between protein that
have virtually no sequence identity (less than 20%) using fold
recongnition methods (threading) and then use that match to generate a
homology model. The BIG problem in this case is to get a good alignment.
Just keep in mind that the model with contain errors, especially in
regions with insertions and other loops. The higher the sequence
identity the better the resulting model. But depending on what the final
usage of the model will be you could try even low sequence identity
A final word about what another person pointed out. There are cases
where even at 100% sequnece identity the structres are not the same.
This happens generally when you have ligands that change the
conformation. Therefore what you have also to consider bound ligands in
the selection of your template. If the template contains calcium atoms
then you are building a model of the target in its possible conformation
when it is bound to calcium (not the free form). But in most modeling
cases you will not encounter this problem.
Roberto Sanchez | phone : (212) 327 7206
The Rockefeller University | fax : (212) 327 7540
1230 York Avenue, Box 38 | e-mail: sancher at rockvax.rockefeller.edu
New York, NY 10021-6399 | http://guitar.rockefeller.edu