jpaige1989 at netscape.net <jpaige1989 at netscape.net> wrote:
> I'm a research student studying protein-protein interaction mapping,
> seeing how the orthologous proteins in the already mapped datasets of
> the worm, fruit fly, and yeast are being used to produce a putative
> human protein interaction map.
Mapping of interactions is a very common task in Bioinformatics. Walhout
et al. have coined the term "interolog" for interaction pairs which have
been transfered to a different proteome based on orthology prediction.
Ortholog assignment is the key step in the mapping process. Depending on
what one intends to do with the interolog network, paralogs may or may
not be a problem. In general, it is important to carefully choose the
mapping parameters in order to avoid spurious mappings - e.g. caused by
highly conserved domains contained in a protein. A simple but fairly
well accepted technique is the bidirectional best BLAST hit: Run BLAST
searches in BOTH directions and only accept two proteins as orthologs if
they represent each others highest scoring hit in the search (and reach
a pretty good e-value or score). On top of that you should apply a
coverage threshold (e.g. alignment must cover at least 70% of both
protein sequences) in order to prevent false positives from conserved
domains.
Some people have suggested better/more sensitive/more robust/...
techniques. A well known example would be the Inparanoid method
developed by the group of Eric Sonnhammer.
There are several publications out there in which such predicted
interaction maps are presented and analyzed - have a look in PubMed to
locate them.
Depending on the phylogenetic distance between the data source and the
target organism mapping can be anywhere from quite easy to almost
impossible.
> I wanted to know if anyone has information on how protein maps are
> made in general and whether the yeast two hybrid method or others are
> more accurate/beneficial.
The best networks rely on experimental evidence rather than mapping or
other predictions. Nevertheless prediction is considered important in
the field since experiments are slow and expensive => not available for
most organisms.
Experimental data with additional annotation is avaiable in serveral PPI
databases, some of which specialize in certain organisms or data types.
google for BIND, DIP, MIPS, GRID, MINT, HPRD, MPPI (in no particular
order) and others to get an overview. An important point is the
distinction between individual experiments found in the scientific
literature and large-scale experiments aiming for comprehensive
coverage. While the high-throuput work is important, it does not live up
to the same quality as individual experiments. One reason is the lack of
independent control experiments in all high throughput approaches i have
seen so far. In the small-scale department, on the other hand, you will
usually find two, three or more different pieces of evidence for each
interaction - based on different experimental techniques. E.g. someone
is interested in Protein X, does a two-hybrid screen and pulls out
proteins Y, Z, M, N, etc. No journal in the works will accept this
observation without confirmation, so the researcher will do a co-IP,
GST-pull-down or other things to get indepentent confirmation and
ideally also investigate the functional aspects of the interactions. =>
high quality data
The main problems with manually curated data from individual experiments
are 1) low coverage and 2) bias.
In general, yeast two-hybrid is great for screening (quite sensitive)
but is prone to produce lots of false positives. It is also easy to miss
interacitons if e.g. the protein of interest refuses to go to the nucleus
or is anchored in the plasmamembrane... (been there, done that!)
cu
Philipp
--
Dr. Philipp Pagel Tel. +49-89-3187-3675
Institute for Bioinformatics / MIPS Fax. +49-89-3187-3585
GSF - German National Research Center for Environment and Health
http://mips.gsf.de/staff/pagel