IUBio GIL .. BIOSCI/Bionet News .. Biosequences .. Software .. FTP

Protcomp 6.0 Finding sub-cellular localization of Eukaryotic proteins: Animal- Plants

softberry at softberry.com softberry at softberry.com
Thu Oct 21 13:31:37 EST 2004


Protcomp 6.0 Finding sub-cellular localization of Eukaryotic proteins: Animal- 
Plants
           New Version 6. of Procomp is available to run at

   http://www.softberry.com/berry.phtml?
topic=protcompan&group=programs&subgroup=proloc

Softberry releases ProtComp ver. 6. The new version of popular program for 
prediction 
of protein subcellular localization, ProtComp, has overall prediction accuracy 
of >90% 
Prediction accuracy of prokaryotic version, ProtCompB ver. 2, is 95%.

ProtComp combines several methods of protein localization prediction - neural 
networks-based 
prediction, direct comparison with updated base of homologous proteins of 
known localization; 
prediction of certain functional peptide sequences, such as signal peptides, 
signal-anchors, 
GPI-anchors, transit peptides of mitochondria and chloroplasts and 
transmembrane segments; 
and search for certain localization-specific motifs. The program includes 
separately 
trained recognizers for animal/fungal and plant proteins, which dramatically 
improves 
recognition accuracy. The following table provides approximate prediction 
accuracy 
for each compartment of animal and fungal proteins. 
Testing was performed on a sample of 1128 proteins of known 
localization which were NOT included in training sample for the program.

Compartment Sample Size Percent predicted correctly (example for Animal)
   ver. 5 ver. 6 
Nucleus 200               88 91 
Plasma Membrane 200       87 100 
Extracellular 200         83 86 
Cytoplasm 199             63 88 
Mitochondria 129          82 89 
Endoplasmic Reticulum 107 83 82 
Peroxisome 34             97 91 
Lysosome 12               91 100 
Golgi 47                  77 91 

Output sample:

  ProtComp Version 6. Identifying sub-cellular location (Animals&Fungi)
  Seq name: QUERY, Length=376
  Significant similarity in Location DB -  Location:Cytoplasmic
  Database sequence: AC=P08319 Location:Cytoplasmic  DE  Alcohol dehydrogenase 
class II pi chain precurs
  Score=14845, Sequence length=391, Alignment length=365
  Predicted by Neural Nets - Extracellular (Secreted) with score    2.4
  Integral Prediction of protein location: Cytoplasmic with score   14.7
  Location weights:     LocDB / PotLocDB / Neural Nets / Tetramers / Integral
  Nuclear                0.0 /      0.0 /        0.71 /      0.00 /     0.71
  Plasma membrane        0.0 /      0.0 /        0.73 /      0.00 /     0.73
  Extracellular          0.0 /      0.0 /        2.42 /      0.00 /     2.42
  Cytoplasmic        14845.0 /  18465.0 /        0.83 /      8.50 /    14.68
  Mitochondrial          0.0 /      0.0 /        0.70 /      0.00 /     0.70
  Endoplasm. retic.      0.0 /      0.0 /        0.70 /      0.50 /     1.21
  Peroxisomal            0.0 /      0.0 /        0.49 /      0.00 /     0.49
  Lysosomal              0.0 /      0.0 /        0.33 /      0.00 /     0.33
  Golgi                  0.0 /      0.0 /        0.40 /      0.00 /     0.40

 

LocDB are scores based on query protein's homologies with proteins of known 
localization.
PotLocDB are scores based on homologies with proteins which locations are not 
experimentally known but are assumed based on strong theoretical evidence.
Neural Nets are scores have been assigned by neural networks.
Tetramers are scores based on comparisons of tetramer distributions calculated 
for QUERY and DB sequences.
Integral are final scores as combinations of previous four scores.


While interpreting output results, it must be kept in mind that: 

1. ProtComp's scores per se, being weights of complex neural networks, do not 
represent probabilities of protein's location in a particular compartment. 
2. Significant homology with protein of known location is a very strong 
indicator of query protein's location.
3. For neural networks scores, their relative values for different 
compartments are more important than absolute values, i.e. if the second best 
score is much lower than the best one, prediction is more reliable, regardless 
of absolute values.
4. If both neural networks and homology predictions point to the same 
compartment, this is very reliable prediction.

 
---




More information about the Bio-www mailing list

Send comments to us at archive@iubioarchive.bio.net