OmnipathR: an R client for the OmniPath web service

Alberto Valdeolivas

alvaldeolivas@gmail.com

Attila Gabor

attila.gabor@bioquant.uni-heidelberg.de

Denes Turei

turei.denes@gmail.com

Julio Saez-Rodriguez

Institute for Computational Biomedicine, Heidelberg University

Abstract

This vignette describes how to use the OmnipathR package to retrieve information from the OmniPath database: https://omnipathdb.org/. In addition, it includes some utility functions to filter analyse and visualize the data.

Introduction

OmnipathR is an R package built to provide easy access to the data stored in the OmniPath webservice (Türei, Korcsmáros, and Saez-Rodriguez 2016):

http://omnipathdb.org/

The webservice implements a very simple REST style API. This package make requests by the HTTP protocol to retreive the data. Hence, fast Internet access is required for a proper use of OmnipathR.

Query types

OmnipathR can retrieve five different types of data:

  • Interactions: protein-protein interactions organized in different datasets:
    • omnipath: the OmniPath data as defined in the original publication (Türei, Korcsmáros, and Saez-Rodriguez 2016) and collected from different databases.
    • pathwayextra: activity flow interactions without literature reference.
    • kinaseextra: enzyme-substrate interactions without literature reference.
    • ligrecextra: ligand-receptor interactions without literature reference.
    • tfregulons: transcription factor (TF)-target interactions from DoRothEA (Garcia-Alonso et al. 2019).
    • tf-miRNA: transcription factor-miRNA interactions
    • miRNA-target: miRNA-mRNA interactions.
    • lncRNA-mRNA: lncRNA-mRNA interactions.
    • small molecule-protein: interactions between small molecules (metabolites, intrinsic ligands, drug compounds) and proteins.
  • Post-translational modifications (PTMs): It provides enzyme-substrate reactions in a very similar way to the aforementioned interactions. Some of the biological databases related to PTMs integrated in OmniPath are Phospho.ELM (Dinkel et al. 2010) and PhosphoSitePlus [Hornbeck et al. (2014)}.

  • Complexes: it provides access to a comprehensive database of more than 22000 protein complexes. This data comes from different resources such as: CORUM (Giurgiu et al. 2018) or Hu.map (Drew et al. 2017).

  • Annotations: it provides a large variety of data regarding different annotations about proteins and complexes. These data come from dozens of databases covering different topics such as: The Topology Data Bank of Transmembrane Proteins (TOPDB) (Dobson et al. 2014) or ExoCarta (Keerthikumar et al. 2016), a database collecting the proteins that were identified in exosomes in multiple organisms.

  • Intercell: it provides information on the roles in inter-cellular signaling. For instance. if a protein is a ligand, a receptor, an extracellular matrix (ECM) component, etc. The data does not come from original sources but combined from several databases by us. The source databases, such as CellPhoneDB (Vento-Tormo et al. 2018) or Receptome (Ben-Shlomo et al. 2003), are also referred for each reacord.

Figure 1 shows an overview of the resources featured in OmniPath. For more detailed information about the original data sources integrated in Omnipath, please visit:

Overview of the resources featured in OmniPath. Causal resources (including activity-flow and enzyme-substrate resources) can provide direction (*) or sign and direction (+) of interactions.

Figure 1: Overview of the resources featured in OmniPath. Causal resources (including activity-flow and enzyme-substrate resources) can provide direction (*) or sign and direction (+) of interactions.

Mouse and rat

Excluding the miRNA interactions, all interactions and PTMs are available for human, mouse and rat. The rodent data has been translated from human using the NCBI Homologene database. Many human proteins do not have known homolog in rodents, hence rodent datasets are smaller than their human counterparts.

In case you work with mouse omics data you might do better to translate your dataset to human (for example using the pypath.homology module, https://github.com/saezlab/pypath/) and use human interaction data.

Installation of the OmnipathR package

First of all, you need a current version of R. OmnipathR is a freely available package deposited on Bioconductor and GitHub. You can install it by running the following commands on an R console:

We also load here the required packages to run the code in this vignette.

Usage Examples

In the following paragraphs, we provide some examples to describe how to use the OmnipathR package to retrieve different types of information from Omnipath webserver. In addition, we play around with the data aiming at obtaining some biological relevant information.

Noteworthy, the sections complexes, annotations and intercell are linked. We explore the annotations and roles in inter-cellular communications of the proteins involved in a given complex. This basic example shows the usefulness of integrating the information available in the different Omnipath resources.

Interactions

Proteins interact among them and with other biological molecules to perform cellular functions. Proteins also participates in pathways, linked series of reactions occurring inter/intra cells to transform products or to transmit signals inducing specific cellular responses. Protein interactions are therefore a very valuable source of information to understand cellular functioning.

We here download the original OmniPath human interactions (Türei, Korcsmáros, and Saez-Rodriguez 2016). To do so, we first check the different source databases and select some of them. Then, we print some of the downloaded interactions (“+” means activation, “-” means inhibition and “?” means undirected interactions or inconclusive data).

##   [1] "ABS"                         "ACSN"                        "ACSN_SignaLink3"            
##   [4] "ARACNe-GTEx_DoRothEA"        "ARN"                         "Adhesome"                   
##   [7] "AlzPathway"                  "BEL-Large-Corpus_ProtMapper" "Baccin2019"                 
##  [10] "BioGRID"                     "CA1"                         "CancerCellMap"              
##  [13] "CancerDrugsDB"               "CellCall"                    "CellChatDB"                 
##  [16] "CellChatDB-cofactors"        "CellPhoneDB"                 "CellPhoneDB_Cellinker"      
##  [19] "CellTalkDB"                  "Cellinker"                   "DEPOD"                      
##  [22] "DIP"                         "DLRP_Cellinker"              "DLRP_talklr"                
##  [25] "DOMINO"                      "DeathDomain"                 "DoRothEA"                   
##  [28] "DoRothEA-reviews_DoRothEA"   "ELM"                         "EMBRACE"                    
##  [31] "ENCODE-distal"               "ENCODE-proximal"             "ENCODE_tf-mirna"            
##  [34] "FANTOM4_DoRothEA"            "Fantom5_LRdb"                "Guide2Pharma"               
##  [37] "Guide2Pharma_CellPhoneDB"    "Guide2Pharma_Cellinker"      "Guide2Pharma_LRdb"          
##  [40] "Guide2Pharma_talklr"         "HOCOMOCO_DoRothEA"           "HPMR"                       
##  [43] "HPMR_Cellinker"              "HPMR_LRdb"                   "HPMR_talklr"                
##  [46] "HPRD"                        "HPRD-phos"                   "HPRD_KEA"                   
##  [49] "HPRD_LRdb"                   "HPRD_MIMP"                   "HPRD_talklr"                
##  [52] "HTRIdb"                      "HTRIdb_DoRothEA"             "HuRI"                       
##  [55] "I2D_CellPhoneDB"             "ICELLNET"                    "IMEx_CellPhoneDB"           
##  [58] "InnateDB"                    "InnateDB-All_CellPhoneDB"    "InnateDB_CellPhoneDB"       
##  [61] "InnateDB_SignaLink3"         "IntAct"                      "IntAct_CellPhoneDB"         
##  [64] "IntAct_DoRothEA"             "JASPAR_DoRothEA"             "KEA"                        
##  [67] "KEGG-MEDICUS"                "Kinexus_KEA"                 "Kirouac2010"                
##  [70] "LMPID"                       "LRdb"                        "Li2012"                     
##  [73] "Lit-BM-17"                   "LncRNADisease"               "MIMP"                       
##  [76] "MINT_CellPhoneDB"            "MPPI"                        "Macrophage"                 
##  [79] "MatrixDB"                    "MatrixDB_CellPhoneDB"        "NCI-PID_ProtMapper"         
##  [82] "NFIRegulomeDB_DoRothEA"      "NRF2ome"                     "NetPath"                    
##  [85] "NetworKIN_KEA"               "ORegAnno"                    "ORegAnno_DoRothEA"          
##  [88] "PAZAR"                       "PAZAR_DoRothEA"              "PhosphoNetworks"            
##  [91] "PhosphoPoint"                "PhosphoSite"                 "PhosphoSite_KEA"            
##  [94] "PhosphoSite_MIMP"            "PhosphoSite_ProtMapper"      "PhosphoSite_noref"          
##  [97] "ProtMapper"                  "REACH_ProtMapper"            "RLIMS-P_ProtMapper"         
## [100] "Ramilowski2015"              "Ramilowski2015_Baccin2019"   "ReMap_DoRothEA"             
## [103] "Reactome_LRdb"               "Reactome_ProtMapper"         "Reactome_SignaLink3"        
## [106] "RegNetwork_DoRothEA"         "SIGNOR"                      "SIGNOR_ProtMapper"          
## [109] "SPIKE"                       "STRING_talklr"               "SignaLink3"                 
## [112] "Sparser_ProtMapper"          "TCRcuration_SignaLink3"      "TFactS_DoRothEA"            
## [115] "TFe_DoRothEA"                "TRED_DoRothEA"               "TRIP"                       
## [118] "TRRD_DoRothEA"               "TRRUST_DoRothEA"             "TransmiR"                   
## [121] "UniProt_CellPhoneDB"         "UniProt_LRdb"                "Wang"                       
## [124] "Wojtowicz2020"               "connectomeDB2020"            "dbPTM"                      
## [127] "iPTMnet"                     "iTALK"                       "lncrnadb"                   
## [130] "miR2Disease"                 "miRDeathDB"                  "miRTarBase"                 
## [133] "miRecords"                   "ncRDeathDB"                  "phosphoELM"                 
## [136] "phosphoELM_KEA"              "phosphoELM_MIMP"             "scConnect"                  
## [139] "talklr"
## # A tibble: 6 × 5
##   source          interaction target         n_resources n_references
##   <chr>           <chr>       <chr>                <int>        <int>
## 1 SRC (P12931)    ==( + )==>  TRPV1 (Q8NER1)           5            6
## 2 PRKACA (P17612) ==( ? )==>  TRPC6 (Q9Y210)           2            3
## 3 PRKG1 (Q13976)  ==( - )==>  TRPC3 (Q13507)           8            2
## 4 PTPN1 (P18031)  ==( - )==>  TRPV6 (Q9H1D0)           6            2
## 5 PRKG1 (Q13976)  ==( + )==>  TRPC7 (Q9HCX4)           3            1
## 6 OS9 (Q13438)    ==(+/-)==>  TRPV4 (Q9HBA0)           3            1

Protein-protein interaction networks

Protein-protein interactions are usually converted into networks. Describing protein interactions as networks not only provides a convenient format for visualization, but also allows applying graph theory methods to mine the biological information they contain.

We convert here our set of interactions to a network/graph (igraphobject). Then, we apply two very common approaches to extract information from a biological network:

  • Shortest Paths: finding a path between two nodes (proteins) going through the minimum number of edges. This can be very useful to track consecutive reactions within a given pathway. We display below the shortest path between two given proteins and all the possible shortests paths between two other proteins. It is to note that the functions print_path\_es and print_path\_vs display very similar results, but the first one takes as an input an edge sequence and the second one a node sequence.
##            source interaction          target n_resources n_references
## 1  TYRO3 (Q06418)  ==( + )==> PIK3R1 (P27986)           4            3
## 2 PIK3R1 (P27986)  ==(+/-)==> PIK3CG (P48736)           3            3
## 3 PIK3CG (P48736)  ==( + )==>   RAC1 (P63000)           2            2
## 4   RAC1 (P63000)  ==( + )==>  STAT3 (P40763)           9            3
## Pathway 1: DYRK2 -> TP53 -> RPS6KA1 -> EEF2K -> MAPKAPK2
  • Clustering: grouping nodes (proteins) in such a way that nodes belonging to the same group (called cluster) are more connected in the network to each other than to those in other groups (clusters). Since proteins interact to perform their functions, proteins within the same cluster are likely to be implicated in similar biological tasks. Figure 2 shows the subgraph containing the proteins and interactions of a specifc protein, ERBB2 The igraph package contains functions to apply sevaral different cluster methods on graphs (visit https://igraph.org/r/doc/ for detailed information.)
ERBB2 associated cluser. Subnetwork extracted from the interactions graph representing the cluster where we can find the gene *ERBB2* (yellow node)

Figure 2: ERBB2 associated cluser. Subnetwork extracted from the interactions graph representing the cluster where we can find the gene ERBB2 (yellow node)

Other interaction datasets

We used above the interactions from the dataset described in the original OmniPath publication (Türei, Korcsmáros, and Saez-Rodriguez 2016). In this section, we provide examples on how to retry and deal with interactions from the remaining datasets. The same functions can been applied to every interaction dataset.

Pathway Extra

In the first example, we are going to get the interactions from the pathwayextra dataset, which contains activity flow interactions without literature reference. We are going to focus on the mouse interactions for a given gene in this particular case.

## Warning in omnipath_check_param(param): The following resources are not available: STRING. Check the resource
## names for spelling mistakes.
## # A tibble: 1 × 5
##   source        interaction target       n_resources n_references
##   <chr>         <chr>       <chr>              <int>        <int>
## 1 Amfr (Q9R049) ==( + )==>  Vcp (Q01853)           6           20

Kinase Extra

Next, we download the interactions from the kinaseextra dataset, which contains enzyme-substrate interactions without literature reference. We are going to focus on rat reactions targeting a particular gene.

## # A tibble: 5 × 5
##   source         interaction target          n_resources n_references
##   <chr>          <chr>       <chr>                 <int>        <dbl>
## 1 Gsk3b (P18266) ==(+/-)==>  Dpysl2 (P47942)          12           33
## 2 Cdk5 (Q03114)  ==(+/-)==>  Dpysl2 (P47942)           6           30
## 3 Rock2 (Q62868) ==( + )==>  Dpysl2 (P47942)          10            6
## 4 Rock1 (Q63644) ==( ? )==>  Dpysl2 (P47942)           7            2
## 5 Fer (P09760)   ==( ? )==>  Dpysl2 (P47942)           2            2

DoRothEA Regulons

Another very interesting interaction dataset also available in OmniPath is DoRothEA (Garcia-Alonso et al. 2019). It contains transcription factor (TF)-target interactions with confidence score, ranging from A-E, being A the most confident interactions. In the code chunk shown below, we select and print the most confident interactions for a given TF.

## # A tibble: 9 × 5
##   source        interaction target          n_resources n_references
##   <chr>         <chr>       <chr>                 <int>        <dbl>
## 1 GLI1 (P08151) ==( + )==>  HHIP (Q96QV1)             1            0
## 2 GLI1 (P08151) ==( + )==>  PTCH1 (Q13635)            1            0
## 3 GLI1 (P08151) ==( + )==>  PTCH2 (Q9Y6C5)            1            0
## 4 GLI1 (P08151) ==( + )==>  BCL2 (P10415)             0            0
## 5 GLI1 (P08151) ==( + )==>  CCND2 (P30279)            0            0
## 6 GLI1 (P08151) ==( - )==>  EGR2 (P11161)             0            0
## 7 GLI1 (P08151) ==( + )==>  IGFBP6 (P24592)           0            0
## 8 GLI1 (P08151) ==( + )==>  SFRP1 (Q8N474)            0            0
## 9 GLI1 (P08151) ==( - )==>  SLIT2 (O94813)            0            0

miRNA-target dataset

The last dataset describing interactions is mirnatarget. It stores miRNA-mRNA and TF-miRNA interactions. These interactions are only available for human so far. We next select the miRNA interacting with the TF selected in the previous code chunk, GLI1. The main function of miRNAs seems to be related with gene regulation. It is therefore interesting to see how some miRNA can regulate the expression of a TF which in turn regulates the expression of other genes. Figure 4 shows a schematic network of the miRNA targeting GLI1 and the genes regulated by this TF.

## # A tibble: 3 × 5
##   source                        interaction target        n_resources n_references
##   <chr>                         <chr>       <chr>               <int>        <dbl>
## 1 hsa-miR-324-5p (MIMAT0000761) ==( ? )==>  GLI1 (P08151)           3            2
## 2 hsa-miR-125b (MIMAT0000423)   ==( ? )==>  GLI1 (P08151)           2            1
## 3 hsa-miR-326 (MIMAT0000756)    ==( ? )==>  GLI1 (P08151)           2            1
miRNA-TF-target network. Schematic network of the miRNA (red square nodes) targeting   extit{GLI1} (yellow node) and the genes regulated by this TF (blue round nodes).

Figure 4: miRNA-TF-target network. Schematic network of the miRNA (red square nodes) targeting extit{GLI1} (yellow node) and the genes regulated by this TF (blue round nodes).

Small molecule-protein dataset

This new dataset has been first added to OmniPath in January 2022. It is still quite small: 3.5k interactions from three resources (SIGNOR, CancerDrugsDB and Cellinker), but has prospects of a great growth in the future. As an example, lets look for targets of a cancer drug, the MEK inhibitor Trametinib:

## # A tibble: 26 × 5
##    source                interaction target          n_resources n_references
##    <chr>                 <chr>       <chr>                 <int>        <dbl>
##  1 TRAMETINIB (11707110) ==( - )==>  MAP2K1 (Q02750)           2            1
##  2 TRAMETINIB (11707110) ==( - )==>  MAP2K2 (P36507)           2            1
##  3 TRAMETINIB (11707110) ==( ? )==>  GNAQ (P50148)             1            0
##  4 TRAMETINIB (11707110) ==( ? )==>  DDX43 (Q9NXZ2)            1            0
##  5 TRAMETINIB (11707110) ==( ? )==>  ETV1 (P50549)             1            0
##  6 TRAMETINIB (11707110) ==( ? )==>  STAG2 (Q8N3U4)            1            0
##  7 TRAMETINIB (11707110) ==( ? )==>  ARAF (P10398)             1            0
##  8 TRAMETINIB (11707110) ==( ? )==>  CDKN2A (P42771)           1            0
##  9 TRAMETINIB (11707110) ==( ? )==>  MAP2K5 (Q13163)           1            0
## 10 TRAMETINIB (11707110) ==( ? )==>  CDKN2A (Q8N726)           1            0
## # … with 16 more rows

Note, the human readable compound names are not reliable, use PubChem CIDs instead.

Post-translational modifications (PTMs)

Another query type available is PTMs which provides enzyme-substrate reactions in a very similar way to the aforementioned interactions. PTMs refer generally to enzymatic modification of proteins after their synthesis in the ribosomes. PTMs can be highly context-specific and they play a main role in the activation/inhibition of biological pathways.

In the next code chunk, we download the PTMs for human. We first check the different available source databases, even though we do not perform any filter. Then, we select and print the reactions involving a specific enzyme-substrate pair. Those reactions lack information about activation or inhibition. To obtain that information, we match the data with OmniPath interactions. Finally, we show that it is also possible to build a graph using this information, and to retrieve PTMs from mouse or rat.

##  [1] "BEL-Large-Corpus_ProtMapper" "DEPOD"                       "HPRD"                       
##  [4] "HPRD_MIMP"                   "KEA"                         "Li2012"                     
##  [7] "MIMP"                        "NCI-PID_ProtMapper"          "PhosphoNetworks"            
## [10] "PhosphoSite"                 "PhosphoSite_MIMP"            "PhosphoSite_ProtMapper"     
## [13] "ProtMapper"                  "REACH_ProtMapper"            "RLIMS-P_ProtMapper"         
## [16] "Reactome_ProtMapper"         "SIGNOR"                      "SIGNOR_ProtMapper"          
## [19] "Sparser_ProtMapper"          "dbPTM"                       "phosphoELM"                 
## [22] "phosphoELM_MIMP"
## Warning: Unknown or uninitialised column: `is_stimulation`.
## # A tibble: 6 × 5
##   enzyme          interaction substrate           modification    n_resources
##   <chr>           <chr>       <chr>               <chr>                 <int>
## 1 MAP2K1 (Q02750) ====>       MAPK3_Y204 (P27361) phosphorylation           8
## 2 MAP2K1 (Q02750) ====>       MAPK3_T202 (P27361) phosphorylation           8
## 3 MAP2K1 (Q02750) ====>       MAPK3_T207 (P27361) phosphorylation           2
## 4 MAP2K1 (Q02750) ====>       MAPK3_Y210 (P27361) phosphorylation           2
## 5 MAP2K1 (Q02750) ====>       MAPK3_T80 (P27361)  phosphorylation           1
## 6 MAP2K1 (Q02750) ====>       MAPK3_Y222 (P27361) phosphorylation           1
##            enzyme interaction           substrate    modification n_resources
## 1 MAP2K1 (Q02750)  ==( + )==> MAPK3_Y204 (P27361) phosphorylation           8
## 2 MAP2K1 (Q02750)  ==( + )==> MAPK3_T202 (P27361) phosphorylation           8
## 4 MAP2K1 (Q02750)  ==( + )==> MAPK3_T207 (P27361) phosphorylation           2
## 5 MAP2K1 (Q02750)  ==( + )==> MAPK3_Y210 (P27361) phosphorylation           2
## 3 MAP2K1 (Q02750)  ==( + )==>  MAPK3_T80 (P27361) phosphorylation           1
## 6 MAP2K1 (Q02750)  ==( + )==> MAPK3_Y222 (P27361) phosphorylation           1

Complexes

Some studies indicate that around 80% of the human proteins operate in complexes, and many proteins belong to several different complexes (Berggård, Linse, and James 2007). These complexes play critical roles in a large variety of biological processes. Some well-known examples are the proteasome and the ribosome. Thus, the description of the full set of protein complexes functioning in cells is essential to improve our understanding of biological processes.

The complexes query provides access to more than 20000 protein complexes. This comprehensive database has been created by integrating different resources. We now download these molecular complexes filtering by some of the source databases. We check the complexes where a couple of specific genes participate. First, we look for the complexes where any of these two genes participate. We then identify the complex where these two genes are jointly involved. Finally, we perform an enrichment analysis with the genes taking part in that complex. You should keep an eye on this complex since it will be used again in the forthcoming sections.

##  [1] "CFinder"        "CORUM"          "CellChatDB"     "CellPhoneDB"    "Cellinker"      "Compleat"      
##  [7] "ComplexPortal"  "Guide2Pharma"   "Havugimana2012" "ICELLNET"       "KEGG-MEDICUS"   "NetworkBlast"  
## [13] "PDB"            "SIGNOR"         "hu.MAP"         "hu.MAP2"
## [1] "NCAPD2_NCAPG_NCAPH_PARP1_SMC2_SMC4_XRCC1"                             
## [2] "CCNA2_CDK2_LIG1_PARP1_POLA1_POLD1_POLE_RFC1_RFC2_RPA1_RPA2_RPA3_TOP1" 
## [3] "CCNA2_CCNB1_CDK1_PARP1_POLA1_POLD1_POLE_RFC1_RFC2_RPA1_RPA2_RPA3_TOP1"
## [4] "MRE11_PARP1_RAD50_TERF2_TERF2IP_XRCC5_XRCC6"                          
## [5] "TERF2_WRN"                                                            
## [6] "CALR_DHX30_H2AX_H2BU1_HSPA5_NPM1_PARP1"
## [1] "PARP1_WRN_XRCC5_XRCC6"
##      term_id source                      term_name      p_value
## 1 GO:0010332  GO:BP    response to gamma radiation 2.371053e-08
## 2 GO:0000723  GO:BP           telomere maintenance 4.447175e-07
## 3 GO:0010212  GO:BP response to ionizing radiation 4.447175e-07
## 4 GO:0071478  GO:BP cellular response to radiation 7.008413e-07
## 5 GO:0032200  GO:BP          telomere organization 7.008413e-07
## 6 GO:0000781  GO:CC   chromosome, telomeric region 2.412057e-07

Annotations

Biological annotations are statements, usually traceable and curated, about the different features of a biological entity. At the genetic level, annotations describe the biological function, the subcellular situation, the DNA location and many other related properties of a particular gene or its gene products.

The annotations query provides a large variety of data about proteins and complexes. These data come from dozens of databases and each kind of annotation record contains different fields. Because of this, here we have a record_id field which is unique within the records of each database. Each row contains one key value pair and you need to use the record_id to connect the related key-value pairs (see examples below).

Now, we focus in the annotations of the complex studied in the previous section. We first inspect the different available databases in the omnipath webserver. Then, we download the annotations for our complex itself as a biological entity. We find annotations related to the nucleus and transcriptional control, which is in agreement with the enrichment analysis results of its individual components.

##  [1] "Adhesome"             "Almen2009"            "Baccin2019"           "CORUM_Funcat"        
##  [5] "CORUM_GO"             "CSPA"                 "CSPA_celltype"        "CancerDrugsDB"       
##  [9] "CancerGeneCensus"     "CancerSEA"            "CellCall"             "CellCellInteractions"
## [13] "CellChatDB"           "CellChatDB_complex"   "CellPhoneDB"          "CellPhoneDB_complex" 
## [17] "CellTalkDB"           "CellTypist"           "Cellinker"            "Cellinker_complex"   
## [21] "ComPPI"               "CytoSig"              "DGIdb"                "DisGeNet"            
## [25] "EMBRACE"              "Exocarta"             "GO_Intercell"         "GPCRdb"              
## [29] "Guide2Pharma"         "HGNC"                 "HPA_secretome"        "HPA_subcellular"     
## [33] "HPA_tissue"           "HPMR"                 "HumanCellMap"         "ICELLNET"            
## [37] "ICELLNET_complex"     "IntOGen"              "Integrins"            "KEGG-PC"             
## [41] "Kirouac2010"          "LOCATE"               "LRdb"                 "MCAM"                
## [45] "MSigDB"               "Matrisome"            "MatrixDB"             "Membranome"          
## [49] "NetPath"              "OPM"                  "PROGENy"              "PanglaoDB"           
## [53] "Phobius"              "Phosphatome"          "Ramilowski2015"       "Ramilowski_location" 
## [57] "SIGNOR"               "SignaLink_function"   "SignaLink_pathway"    "Surfaceome"          
## [61] "TCDB"                 "TFcensus"             "TopDB"                "UniProt_family"      
## [65] "UniProt_keyword"      "UniProt_location"     "UniProt_tissue"       "UniProt_topology"    
## [69] "Vesiclepedia"         "Zhong2015"            "connectomeDB2020"     "iTALK"               
## [73] "kinase.com"           "scConnect"            "scConnect_complex"    "talklr"
## # A tibble: 10 × 3
##    source label      value                                  
##    <chr>  <chr>      <chr>                                  
##  1 ComPPI location   nucleus                                
##  2 MSigDB collection chemical_and_genetic_perturbations     
##  3 MSigDB geneset    PUJANA_CHEK2_PCC_NETWORK               
##  4 MSigDB collection chemical_and_genetic_perturbations     
##  5 MSigDB geneset    PUJANA_BRCA1_PCC_NETWORK               
##  6 MSigDB collection reactome_pathways                      
##  7 MSigDB geneset    REACTOME_DNA_DOUBLE_STRAND_BREAK_REPAIR
##  8 MSigDB collection reactome_pathways                      
##  9 MSigDB geneset    REACTOME_DNA_REPAIR                    
## 10 MSigDB collection chemical_and_genetic_perturbations

Afterwards, we explore the annotations of the individual components of the complex in some databases. We check the pathways where these proteins are involved. Once again, we also find many nucleus related annotations when checking their cellular location.

Then, we explore some annotations of its individual components. Pathways where the proteins belong:

## # A tibble: 7 × 2
##   genesymbol value                                        
##   <chr>      <chr>                                        
## 1 PARP1      Tumor necrosis factor (TNF) alpha            
## 2 PARP1      Androgen receptor (AR)                       
## 3 PARP1      TNF-related weak inducer of apoptosis (TWEAK)
## 4 PARP1      Corticotropin-releasing hormone (CRH)        
## 5 PARP1      Oncostatin-M (OSM)                           
## 6 XRCC5      Androgen receptor (AR)                       
## 7 XRCC6      Androgen receptor (AR)

Subcellular localization of our proteins:

Since we have same record_id for some results of our query, we spread these records across columns:

## # A tibble: 11 × 7
##    uniprot genesymbol entity_type source record_id location      score             
##    <chr>   <chr>      <chr>       <chr>      <dbl> <chr>         <chr>             
##  1 P12956  XRCC6      protein     ComPPI      2967 nucleus       0.99999997629184  
##  2 P09874  PARP1      protein     ComPPI     11241 nucleus       0.999999887104    
##  3 Q14191  WRN        protein     ComPPI     16096 nucleus       0.9999996544      
##  4 P13010  XRCC5      protein     ComPPI     13373 nucleus       0.99999868288     
##  5 P13010  XRCC5      protein     ComPPI     13371 membrane      0.972             
##  6 P12956  XRCC6      protein     ComPPI      2965 cytosol       0.958             
##  7 P13010  XRCC5      protein     ComPPI     13374 cytosol       0.958             
##  8 Q14191  WRN        protein     ComPPI     16097 cytosol       0.94              
##  9 P12956  XRCC6      protein     ComPPI      2966 extracellular 0.8600000000000001
## 10 P12956  XRCC6      protein     ComPPI      2968 membrane      0.8600000000000001
## 11 P13010  XRCC5      protein     ComPPI     13372 extracellular 0.8600000000000001

The way above, we more or less reconstituted the data as it is in the original resource. The same can be done much easier by passing the wide = TRUE parameter to import_omnipath_annotations. In this case, if the data contains more than one resources, a list of data frames will be returned.

Intercell

Cells perceive cues from their microenvironment and neighboring cells, and respond accordingly to ensure proper activities and coordination between them. The ensemble of these communication process is called inter-cellular signaling (intercell).

Intercell query provides information about the roles of proteins in inter-cellular signaling (e.g. if a protein is a ligand, a receptor, an extracellular matrix (ECM) component, etc.) This query type is very similar to annotations. However, intercell data does not come from original sources, but combined from several databases by us into categories (we also refer to the original sources).

We first inspect the different categories available in the OmniPath webserver. Then, we focus again in our previously selected complex and we check its the location of its individual components in the inter-cellular context. We can however see that the components of this particular complex are intracellular.

##  [1] "transmembrane"                       "transmembrane_predicted"            
##  [3] "peripheral"                          "plasma_membrane"                    
##  [5] "plasma_membrane_transmembrane"       "plasma_membrane_regulator"          
##  [7] "plasma_membrane_peripheral"          "secreted"                           
##  [9] "cell_surface"                        "ecm"                                
## [11] "ligand"                              "receptor"                           
## [13] "secreted_enzyme"                     "secreted_peptidase"                 
## [15] "extracellular"                       "intracellular"                      
## [17] "receptor_regulator"                  "secreted_receptor"                  
## [19] "sparc_ecm_regulator"                 "ecm_regulator"                      
## [21] "ligand_regulator"                    "cell_surface_ligand"                
## [23] "cell_adhesion"                       "matrix_adhesion"                    
## [25] "adhesion"                            "matrix_adhesion_regulator"          
## [27] "cell_surface_enzyme"                 "cell_surface_peptidase"             
## [29] "secreted_enyzme"                     "extracellular_peptidase"            
## [31] "secreted_peptidase_inhibitor"        "transporter"                        
## [33] "ion_channel"                         "ion_channel_regulator"              
## [35] "gap_junction"                        "tight_junction"                     
## [37] "adherens_junction"                   "desmosome"                          
## [39] "intracellular_intercellular_related"
## # A tibble: 4 × 3
##   category      genesymbol parent       
##   <chr>         <chr>      <chr>        
## 1 intracellular PARP1      intracellular
## 2 intracellular WRN        intracellular
## 3 intracellular XRCC5      intracellular
## 4 intracellular XRCC6      intracellular

The import_intercell_network function creates the most complete network, including many interactions which are false positives in the context of interacellular communication. It is highly recommended to apply some quality filtering on this network. The high_confidence parameter performs a quiet stringent filtering:

Using the function filter_intercell_network instead, you have much more flexibility to adjust the stringency of the filtering to the needs of your analysis. See the full list of options in the docs of the function.

Conclusion

OmnipathR provides access to the wealth of data stored in the OmniPath webservice http://omnipathdb.org/ from the R enviroment. In addition, it contains some utility functions for visualization, filtering and analysis. The main strength of OmnipathR is the straightforward transformation of the different OmniPath data into commonly used R objects, such as dataframes and graphs. Consequently, it allows an easy integration of the different types of data and a gateway to the vast number of R packages dedicated to the analysis and representaiton of biological data. We highlighted these abilities in some of the examples detailed in previous sections of this document.

Session info

## R version 4.2.1 (2022-06-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.16-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.16-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB             
##  [4] LC_COLLATE=C               LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
## [10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] gprofiler2_0.2.1 dnet_1.1.7       supraHex_1.35.0  hexbin_1.28.2    tidyr_1.2.1      knitr_1.40      
##  [7] magrittr_2.0.3   ggraph_2.1.0     igraph_1.3.5     ggplot2_3.3.6    dplyr_1.0.10     OmnipathR_3.5.25
## [13] BiocStyle_2.25.0
## 
## loaded via a namespace (and not attached):
##  [1] nlme_3.1-160        bitops_1.0-7        bit64_4.0.5         progress_1.2.2      httr_1.4.4         
##  [6] Rgraphviz_2.41.1    tools_4.2.1         backports_1.4.1     bslib_0.4.0         utf8_1.2.2         
## [11] R6_2.5.1            lazyeval_0.2.2      BiocGenerics_0.43.4 DBI_1.1.3           colorspace_2.0-3   
## [16] withr_2.5.0         tidyselect_1.2.0    gridExtra_2.3       prettyunits_1.1.1   bit_4.0.4          
## [21] curl_4.3.3          compiler_4.2.1      graph_1.75.0        cli_3.4.1           rvest_1.0.3        
## [26] xml2_1.3.3          plotly_4.10.0       labeling_0.4.2      bookdown_0.29       sass_0.4.2         
## [31] scales_1.2.1        checkmate_2.1.0     readr_2.1.3         rappdirs_0.3.3      stringr_1.4.1      
## [36] digest_0.6.30       rmarkdown_2.17      pkgconfig_2.0.3     htmltools_0.5.3     fastmap_1.1.0      
## [41] highr_0.9           htmlwidgets_1.5.4   rlang_1.0.6         readxl_1.4.1        jquerylib_0.1.4    
## [46] farver_2.1.1        generics_0.1.3      jsonlite_1.8.3      vroom_1.6.0         RCurl_1.98-1.9     
## [51] Matrix_1.5-1        Rcpp_1.0.9          munsell_0.5.0       fansi_1.0.3         ape_5.6-2          
## [56] logger_0.2.2        viridis_0.6.2       lifecycle_1.0.3     stringi_1.7.8       yaml_2.3.6         
## [61] MASS_7.3-58.1       grid_4.2.1          parallel_4.2.1      ggrepel_0.9.1       crayon_1.5.2       
## [66] lattice_0.20-45     graphlayouts_0.8.3  hms_1.1.2           magick_2.7.3        pillar_1.8.1       
## [71] stats4_4.2.1        glue_1.6.2          evaluate_0.17       data.table_1.14.4   BiocManager_1.30.18
## [76] vctrs_0.5.0         png_0.1-7           tzdb_0.3.0          tweenr_2.0.2        selectr_0.4-2      
## [81] cellranger_1.1.0    gtable_0.3.1        purrr_0.3.5         polyclip_1.10-4     assertthat_0.2.1   
## [86] cachem_1.0.6        xfun_0.34           ggforce_0.4.1       tidygraph_1.2.2     later_1.3.0        
## [91] viridisLite_0.4.1   tibble_3.1.8        ellipsis_0.3.2

References

Ben-Shlomo, I., S. Yu Hsu, R. Rauch, H. W. Kowalski, and A. J. W. Hsueh. 2003. “Signaling Receptome: A Genomic and Evolutionary Perspective of Plasma Membrane Receptors Involved in Signal Transduction.” Science Signaling 2003 (187): re9–re9. https://doi.org/10.1126/stke.2003.187.re9.

Berggård, Tord, Sara Linse, and Peter James. 2007. “Methods for the Detection and Analysis of Proteinprotein Interactions.” PROTEOMICS 7 (16): 2833–42. https://doi.org/10.1002/pmic.200700131.

Dinkel, H., C. Chica, A. Via, C. M. Gould, L. J. Jensen, T. J. Gibson, and F. Diella. 2010. “Phospho.ELM: A Database of Phosphorylation Sites–Update 2011.” Nucleic Acids Research 39 (Database): D261–D267. https://doi.org/10.1093/nar/gkq1104.

Dobson, László, Tamás Langó, István Reményi, and Gábor E. Tusnády. 2014. “Expediting Topology Data Gathering for the TOPDB Database.” Nucleic Acids Research 43 (D1): D283–D289. https://doi.org/10.1093/nar/gku1119.

Drew, Kevin, Chanjae Lee, Ryan L Huizar, Fan Tu, Blake Borgeson, Claire D McWhite, Yun Ma, John B Wallingford, and Edward M Marcotte. 2017. “Integration of over 9, 000 Mass Spectrometry Experiments Builds a Global Map of Human Protein Complexes.” Molecular Systems Biology 13 (6): 932. https://doi.org/10.15252/msb.20167490.

Garcia-Alonso, Luz, Christian H. Holland, Mahmoud M. Ibrahim, Denes Turei, and Julio Saez-Rodriguez. 2019. “Benchmark and Integration of Resources for the Estimation of Human Transcription Factor Activities.” Genome Research 29 (8): 1363–75. https://doi.org/10.1101/gr.240663.118.

Giurgiu, Madalina, Julian Reinhard, Barbara Brauner, Irmtraud Dunger-Kaltenbach, Gisela Fobo, Goar Frishman, Corinna Montrone, and Andreas Ruepp. 2018. “CORUM: The Comprehensive Resource of Mammalian Protein Complexes2019.” Nucleic Acids Research 47 (D1): D559–D563. https://doi.org/10.1093/nar/gky973.

Hornbeck, Peter V., Bin Zhang, Beth Murray, Jon M. Kornhauser, Vaughan Latham, and Elzbieta Skrzypek. 2014. “PhosphoSitePlus, 2014: Mutations, PTMs and Recalibrations.” Nucleic Acids Research 43 (D1): D512–D520. https://doi.org/10.1093/nar/gku1267.

Keerthikumar, Shivakumar, David Chisanga, Dinuka Ariyaratne, Haidar Al Saffar, Sushma Anand, Kening Zhao, Monisha Samuel, et al. 2016. “ExoCarta: A Web-Based Compendium of Exosomal Cargo.” Journal of Molecular Biology 428 (4): 688–92. https://doi.org/10.1016/j.jmb.2015.09.019.

Türei, Dénes, Tamás Korcsmáros, and Julio Saez-Rodriguez. 2016. “OmniPath: Guidelines and Gateway for Literature-Curated Signaling Pathway Resources.” Nature Methods 13 (12): 966–67. https://doi.org/10.1038/nmeth.4077.

Vento-Tormo, Roser, Mirjana Efremova, Rachel A. Botting, Margherita Y. Turco, Miquel Vento-Tormo, Kerstin B. Meyer, Jong-Eun Park, et al. 2018. “Single-Cell Reconstruction of the Early Maternalfetal Interface in Humans.” Nature 563 (7731): 347–53. https://doi.org/10.1038/s41586-018-0698-6.