Annotation transfer between genomes: protein-protein interologs and protein-DNA regulogs
Haiyuan Yu, Nicholas M Luscombe, Hao Xin Lu, Xiaowei Zhu, Jing-Dong J. Han, Nicolas Bertin,
Sambath Chung, Marc Vidal, Mark Gerstein (2004) Genome Res 14: 1107-18.
Proteins function mainly through interactions, especially with DNA and other proteins. While some large-scale interaction networks are now available for a number of model organisms, their experimental generation remains difficult. Consequently, interolog mapping - the transfer of interaction annotation from one organism to another using comparative genomics - is of significant value. Here we quantitatively assess the degree to which interologs can be reliably transferred between species as a function of the sequence similarity of the corresponding interacting proteins. Using interaction information from S. cerevisiae, C. elegans, D. melanogaster, and H. pylori, we find that protein-protein interactions can be transferred when a pair of proteins has a joint sequence identity >80% or a joint E-value <10-70. (These "joint" quantities are the geometric means of the identities or E-values for the two pairs of interacting proteins.) We generalize our interolog analysis to protein-DNA binding, finding such interactions are conserved at specific thresholds between 30% and 60% sequence identity depending on the protein family. Furthermore, we introduce the concept of "regulog" - a conserved regulatory relationship between proteins across different species. We map interologs and regulogs from yeast to a number of genomes with limited experimental annotation (e.g. A. thaliana) and make these available through an on-line database at http://interolog.gersteinlab.org. Specifically, we are able to transfer ~90,000 potential protein-protein interactions to the worm. We test a number of these in two-hybrid experiments and are able to verify 45 overlaps, which we show to be statistically significant.