PANTHER.db 1.0.11
PANTHER.db
The PANTHER.db package provides a select
interface to the compiled PANTHER ontology residing within a SQLite database.
PANTHER.db can be installed from Bioconductor using
if (!requireNamespace("BiocManager")) install.packages("BiocManager")
BiocManager::install("PANTHER.db")
The size of the underlying SQLite database is currently about 500MB and has to be pre downloaded using AnnotationHub as follows
if (!requireNamespace("AnnotationHub")) BiocManager::install("AnnotationHub")
library(AnnotationHub)
ah <- AnnotationHub()
query(ah, "PANTHER.db")[[1]]
Finally PANTHER.db can be loaded with
library(PANTHER.db)
If you already know about the select interface, you can immediately learn about the various methods for this object by just looking at the help page.
help("PANTHER.db")
When you load the PANTHER.db package, it creates a PANTHER.db object. If you look at the object you will see some helpful information about it.
PANTHER.db
## PANTHER.db object:
## | ORGANISMS: AMBTC|ANOCA|ANOPHELES|AQUAE|ARABIDOPSIS|ASHGO|ASPFU|BACCR|BACSU|BACTN|BATDJ|BOVINE|BRADI|BRADU|BRAFL|BRANA|BRARP|CAEBR|CANAL|CANINE|CAPAN|CHICKEN|CHIMP|CHLAA|CHLRE|CHLTR|CIOIN|CITSI|CLOBH|COELICOLOR|COXBU|CRYNJ|CUCSA|DAPPU|DEIRA|DICDI|DICPU|DICTD|ECOLI|EMENI|ENTHI|ERYGU|EUCGR|FELCA|FLY|FUSNN|GEOSL|GIAIC|GLOVI|GORGO|GOSHI|HAEIN|HALSA|HELAN|HELPY|HELRO|HORSE|HORVV|HUMAN|IXOSC|JUGRE|KORCO|LACSA|LEIMA|LEPIN|LEPOC|LISMO|MAIZE|MALARIA|MANES|MEDTR|METAC|METJA|MONBE|MONDO|MOUSE|MUSAM|MYCGE|MYCTU|NEIMB|NELNU|NEMVE|NEUCR|NITMS|ORNAN|ORYLA|ORYSJ|OSTTA|PARTE|PHANO|PHODC|PHYPA|PHYRM|PIG|POPTR|PRIPA|PRUPE|PSEAE|PUCGT|PYRAE|RAT|RHESUS|RHOBA|RICCO|SACS2|SALTY|SCHPO|SCLS1|SELML|SETIT|SHEON|SOLLC|SOLTU|SORBI|SOYBN|SPIOL|STAA8|STRPU|STRR6|SYNY3|THAPS|THECC|THEKO|THEMA|THEYD|TOBAC|TRIAD|TRICA|TRIVA|TRYB2|USTMA|VIBCH|VITVI|WHEAT|WORM|XANCP|XENOPUS|YARLI|YEAST|YERPE|ZEBRAFISH|ZOSMR
## | PANTHERVERSION: 16.0
## | PANTHERSOURCEURL: ftp.pantherdb.org
## | PANTHERSOURCEDATE: 2021-Feb02
## | package: AnnotationDbi
## | Db type: PANTHER.db
## | DBSCHEMA: PANTHER_DB
## | DBSCHEMAVERSION: 2.1
## | UNIPROT to ENTREZ mapping: 2021-Feb02
By default, you can see that the PANTHER.db object is set to
retrieve records from the various organisms supported by http://pantherdb.org.
Methods are provided to restrict all queries to a specific organism.
In order to change it, you first need to look up the appropriate organism
identifier for the organism that you are interested in.
The PANTHER gene ontology is based on the Uniprot reference proteome set.
In order to display the choices, we have provided the helper function
availablePthOrganisms
which will list all the supported
organisms along with their Uniprot organism name and taxonomy ids:
availablePthOrganisms(PANTHER.db)[1:5,]
## AnnotationDbi Species PANTHER Species Genome Source
## 1 HUMAN HUMAN HGNC,Ensembl
## 2 MOUSE MOUSE Ensembl,MGI
## 3 RAT RAT Ensembl,RGD
## 4 CHICKEN CHICK Ensembl
## 5 ZEBRAFISH DANRE ZFIN,Ensembl
## Genome Date UNIPROT Species ID UNIPROT Species Name
## 1 Reference Proteome 2020_04 HUMAN Homo sapiens
## 2 Reference Proteome 2020_04 MOUSE Mus musculus
## 3 Reference Proteome 2020_04 RAT Rattus norvegicus
## 4 Reference Proteome 2020_04 CHICK Gallus gallus
## 5 Reference Proteome 2020_04 DANRE Danio rerio
## UNIPROT Taxon ID
## 1 9606
## 2 10090
## 3 10116
## 4 9031
## 5 7955
Once you have learned the PANTHER organism name for the organism of interest, you can then change the organism for the PANTHER.db object:
pthOrganisms(PANTHER.db) <- "HUMAN"
PANTHER.db
## PANTHER.db object:
## | ORGANISMS: HUMAN
## | PANTHERVERSION: 16.0
## | PANTHERSOURCEURL: ftp.pantherdb.org
## | PANTHERSOURCEDATE: 2021-Feb02
## | package: AnnotationDbi
## | Db type: PANTHER.db
## | DBSCHEMA: PANTHER_DB
## | DBSCHEMAVERSION: 2.1
## | UNIPROT to ENTREZ mapping: 2021-Feb02
resetPthOrganisms(PANTHER.db)
PANTHER.db
## PANTHER.db object:
## | ORGANISMS: AMBTC|ANOCA|ANOPHELES|AQUAE|ARABIDOPSIS|ASHGO|ASPFU|BACCR|BACSU|BACTN|BATDJ|BOVINE|BRADI|BRADU|BRAFL|BRANA|BRARP|CAEBR|CANAL|CANINE|CAPAN|CHICKEN|CHIMP|CHLAA|CHLRE|CHLTR|CIOIN|CITSI|CLOBH|COELICOLOR|COXBU|CRYNJ|CUCSA|DAPPU|DEIRA|DICDI|DICPU|DICTD|ECOLI|EMENI|ENTHI|ERYGU|EUCGR|FELCA|FLY|FUSNN|GEOSL|GIAIC|GLOVI|GORGO|GOSHI|HAEIN|HALSA|HELAN|HELPY|HELRO|HORSE|HORVV|HUMAN|IXOSC|JUGRE|KORCO|LACSA|LEIMA|LEPIN|LEPOC|LISMO|MAIZE|MALARIA|MANES|MEDTR|METAC|METJA|MONBE|MONDO|MOUSE|MUSAM|MYCGE|MYCTU|NEIMB|NELNU|NEMVE|NEUCR|NITMS|ORNAN|ORYLA|ORYSJ|OSTTA|PARTE|PHANO|PHODC|PHYPA|PHYRM|PIG|POPTR|PRIPA|PRUPE|PSEAE|PUCGT|PYRAE|RAT|RHESUS|RHOBA|RICCO|SACS2|SALTY|SCHPO|SCLS1|SELML|SETIT|SHEON|SOLLC|SOLTU|SORBI|SOYBN|SPIOL|STAA8|STRPU|STRR6|SYNY3|THAPS|THECC|THEKO|THEMA|THEYD|TOBAC|TRIAD|TRICA|TRIVA|TRYB2|USTMA|VIBCH|VITVI|WHEAT|WORM|XANCP|XENOPUS|YARLI|YEAST|YERPE|ZEBRAFISH|ZOSMR
## | PANTHERVERSION: 16.0
## | PANTHERSOURCEURL: ftp.pantherdb.org
## | PANTHERSOURCEDATE: 2021-Feb02
## | package: AnnotationDbi
## | Db type: PANTHER.db
## | DBSCHEMA: PANTHER_DB
## | DBSCHEMAVERSION: 2.1
## | UNIPROT to ENTREZ mapping: 2021-Feb02
As you can see, organisms are now restricted to Homo sapiens. To display all data which can be returned from a select query, the columns method can be used:
columns(PANTHER.db)
## [1] "CLASS_ID" "CLASS_TERM" "COMPONENT_ID" "COMPONENT_TERM"
## [5] "CONFIDENCE_CODE" "ENTREZ" "EVIDENCE" "EVIDENCE_TYPE"
## [9] "FAMILY_ID" "FAMILY_TERM" "GOSLIM_ID" "GOSLIM_TERM"
## [13] "PATHWAY_ID" "PATHWAY_TERM" "SPECIES" "SUBFAMILY_TERM"
## [17] "UNIPROT"
Some of these fields can also be used as keytypes:
keytypes(PANTHER.db)
## [1] "CLASS_ID" "COMPONENT_ID" "ENTREZ" "FAMILY_ID" "GOSLIM_ID"
## [6] "PATHWAY_ID" "SPECIES" "UNIPROT"
It is also possible to display all possible keys of a table for
any keytype. If keytype is unspecified, the FAMILY_ID
will be returned.
go_ids <- head(keys(PANTHER.db,keytype="GOSLIM_ID"))
go_ids
## [1] "GO:0000003" "GO:0000018" "GO:0000027" "GO:0000030" "GO:0000038"
## [6] "GO:0000041"
Finally, you can loop up whatever combinations of columns, keytypes and keys
that you need when using select
or mapIds
.
cols <- "CLASS_ID"
res <- mapIds(PANTHER.db, keys=go_ids, column=cols, keytype="GOSLIM_ID", multiVals="list")
lengths(res)
## GO:0000003 GO:0000018 GO:0000027 GO:0000030 GO:0000038 GO:0000041
## 54 10 6 5 8 13
res_inner <- select(PANTHER.db, keys=go_ids, columns=cols, keytype="GOSLIM_ID")
nrow(res_inner)
## [1] 96
tail(res_inner)
## GOSLIM_ID CLASS_ID
## 1072 GO:0000041 PC00191
## 1073 GO:0000041 PC00149
## 1074 GO:0000041 PC00068
## 1082 GO:0000041 PC00003
## 1083 GO:0000041 PC00262
## 1084 GO:0000041 PC00176
By default, all tables will be joined using the central table with PANTHER family IDs by an inner join. Therefore all rows without an associated PANTHER family ID will be removed from the output. To include all results with an associated PANTHER family ID, the argument jointype
of the select
function must be set to left
.
res_left <- select(PANTHER.db, keys=go_ids, columns=cols,keytype="GOSLIM_ID", jointype="left")
nrow(res_left)
## [1] 1705
tail(res_left)
## GOSLIM_ID FAMILY_ID CLASS_ID
## 1700 GO:0000041 PTHR45820:SF1 <NA>
## 1701 GO:0000041 PTHR45820:SF2 <NA>
## 1702 GO:0000041 PTHR45820:SF3 <NA>
## 1703 GO:0000041 PTHR45820:SF4 <NA>
## 1704 GO:0000041 PTHR45820:SF5 <NA>
## 1705 GO:0000041 PTHR45820:SF6 <NA>
To access the PANTHER Protein Class ontology tree structure, the
method traverseClassTree
can be used:
term <- "PC00209"
select(PANTHER.db,term, "CLASS_TERM","CLASS_ID")
## [1] CLASS_ID CLASS_TERM
## <0 rows> (or 0-length row.names)
ancestors <- traverseClassTree(PANTHER.db,term,scope="ANCESTOR")
select(PANTHER.db,ancestors, "CLASS_TERM","CLASS_ID")
## [1] CLASS_ID CLASS_TERM
## <0 rows> (or 0-length row.names)
parents <- traverseClassTree(PANTHER.db,term,scope="PARENT")
select(PANTHER.db,parents, "CLASS_TERM","CLASS_ID")
## [1] CLASS_ID CLASS_TERM
## <0 rows> (or 0-length row.names)
children <- traverseClassTree(PANTHER.db,term,scope="CHILD")
select(PANTHER.db,children, "CLASS_TERM","CLASS_ID")
## [1] CLASS_ID CLASS_TERM
## <0 rows> (or 0-length row.names)
offspring <- traverseClassTree(PANTHER.db,term,scope="OFFSPRING")
select(PANTHER.db,offspring, "CLASS_TERM","CLASS_ID")
## [1] CLASS_ID CLASS_TERM
## <0 rows> (or 0-length row.names)
sessionInfo()
## R Under development (unstable) (2021-01-20 r79850)
## Platform: x86_64-apple-darwin17.7.0 (64-bit)
## Running under: macOS High Sierra 10.13.6
##
## Matrix products: default
## BLAS: /Users/ka36530_ca/R-stuff/bin/R-devel/lib/libRblas.dylib
## LAPACK: /Users/ka36530_ca/R-stuff/bin/R-devel/lib/libRlapack.dylib
##
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] PANTHER.db_1.0.11 RSQLite_2.2.3 AnnotationHub_2.23.2
## [4] BiocFileCache_1.15.1 dbplyr_2.1.0 AnnotationDbi_1.53.1
## [7] IRanges_2.25.6 S4Vectors_0.29.7 Biobase_2.51.0
## [10] BiocGenerics_0.37.1 BiocStyle_2.19.1
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.6 png_0.1-7
## [3] Biostrings_2.59.2 assertthat_0.2.1
## [5] digest_0.6.27 utf8_1.1.4
## [7] mime_0.10 R6_2.5.0
## [9] evaluate_0.14 httr_1.4.2
## [11] pillar_1.5.0 zlibbioc_1.37.0
## [13] rlang_0.4.10 curl_4.3
## [15] jquerylib_0.1.3 blob_1.2.1
## [17] rmarkdown_2.7 stringr_1.4.0
## [19] bit_4.0.4 shiny_1.6.0
## [21] compiler_4.1.0 httpuv_1.5.5
## [23] xfun_0.21 pkgconfig_2.0.3
## [25] htmltools_0.5.1.1 tidyselect_1.1.0
## [27] KEGGREST_1.31.1 tibble_3.1.0
## [29] interactiveDisplayBase_1.29.0 bookdown_0.21
## [31] fansi_0.4.2 withr_2.4.1
## [33] crayon_1.4.1 dplyr_1.0.4
## [35] later_1.1.0.1 rappdirs_0.3.3
## [37] xtable_1.8-4 jsonlite_1.7.2
## [39] lifecycle_1.0.0 DBI_1.1.1
## [41] magrittr_2.0.1 stringi_1.5.3
## [43] cachem_1.0.4 XVector_0.31.1
## [45] promises_1.2.0.1 bslib_0.2.4
## [47] ellipsis_0.3.1 filelock_1.0.2
## [49] generics_0.1.0 vctrs_0.3.6
## [51] tools_4.1.0 bit64_4.0.5
## [53] glue_1.4.2 purrr_0.3.4
## [55] BiocVersion_3.13.1 fastmap_1.1.0
## [57] yaml_2.2.1 BiocManager_1.30.10
## [59] memoise_2.0.0 knitr_1.31
## [61] sass_0.3.1