The depmap
package aims to provide a reproducible research framework
to cancer dependency data described by
Tsherniak, Aviad, et al. “Defining a cancer dependency map.” Cell 170.3 (2017): 564-576..
The data found in the depmap
package has been formatted to facilitate the use of common R packages
such as dplyr
and ggplot2
. We hope that this package will allow
researchers to more easily mine, explore and visually illustrate
dependency data taken from the Depmap cancer genomic dependency study.
To install depmap, the BiocManager Bioconductor Project Package Manager is required. If BiocManager is not already installed, it will need to be done so beforehand. Type (within R) install.packages(“BiocManager”) (This needs to be done just once.)
install.packages("BiocManager")
BiocManager::install("depmap")
The depmap
package fully depends on the ExperimentHub
Bioconductor package,
which allows the data accessed in this package to be stored and retrieved from
the cloud.
library("depmap")
library("ExperimentHub")
The depmap
package currently contains eight datasets available through ExperimentHub
.
The data found in this R package has been converted from a “wide”
format .csv
file to “long” format .rda file. None of the values taken
from the original datasets have been changed, although the columns
have been re-arranged. Descriptions of the changes made are described
under the Details
section after querying the relevant dataset.
## create ExperimentHub query object
eh <- ExperimentHub()
## snapshotDate(): 2022-10-24
query(eh, "depmap")
## ExperimentHub with 82 records
## # snapshotDate(): 2022-10-24
## # $dataprovider: Broad Institute
## # $species: Homo sapiens
## # $rdataclass: tibble
## # additional mcols(): taxonomyid, genome, description,
## # coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
## # rdatapath, sourceurl, sourcetype
## # retrieve records with, e.g., 'object[["EH2260"]]'
##
## title
## EH2260 | rnai_19Q1
## EH2261 | crispr_19Q1
## EH2262 | copyNumber_19Q1
## EH2263 | RPPA_19Q1
## EH2264 | TPM_19Q1
## ... ...
## EH7555 | copyNumber_22Q2
## EH7556 | TPM_22Q2
## EH7557 | mutationCalls_22Q2
## EH7558 | metadata_22Q2
## EH7559 | achilles_22Q2
Each dataset has a ExperimentHub
accession number, (e.g. EH2260 refers to
the rnai
dataset from the 19Q1 release).
## see ?depmap and browseVignettes('depmap') for documentation
## loading from cache
The rnai
dataset contains the combined genetic dependency data for RNAi -
induced gene knockdown for select genes and cancer cell lines. This data
corresponds to the D2_combined_genetic_dependency_scores.csv
file found in the
22Q2 depmap release and includes 17309
genes, 712 cell lines, 30 primary diseases and 31
lineages.
Specific rnai
datasets can be accessed, such as rnai_19Q1
by EH number.
rnai <- eh[["EH2260"]]
rnai
## # A tibble: 12,324,008 × 6
## depmap_id cell_line gene gene_name entrez_id depen…¹
## <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 ACH-001270 127399_SOFT_TISSUE A1BG (1) A1BG 1 NA
## 2 ACH-001270 127399_SOFT_TISSUE NAT2 (10) NAT2 10 NA
## 3 ACH-001270 127399_SOFT_TISSUE ADA (100) ADA 100 NA
## 4 ACH-001270 127399_SOFT_TISSUE CDH2 (1000) CDH2 1000 -0.195
## 5 ACH-001270 127399_SOFT_TISSUE AKT3 (10000) AKT3 10000 -0.256
## 6 ACH-001270 127399_SOFT_TISSUE MED6 (10001) MED6 10001 -0.174
## 7 ACH-001270 127399_SOFT_TISSUE NR2E3 (10002) NR2E3 10002 -0.140
## 8 ACH-001270 127399_SOFT_TISSUE NAALAD2 (10003) NAALAD2 10003 NA
## 9 ACH-001270 127399_SOFT_TISSUE DUXB (100033411) DUXB 100033411 NA
## 10 ACH-001270 127399_SOFT_TISSUE PDZK1P1 (100034743) PDZK1P1 100034743 NA
## # … with 12,323,998 more rows, and abbreviated variable name ¹dependency
The most recent rnai
dataset can be automatically loaded into R by using the
depmap_rnai
function.
depmap::depmap_rnai()
## snapshotDate(): 2022-10-24
## see ?depmap and browseVignettes('depmap') for documentation
## loading from cache
## # A tibble: 12,324,008 × 6
## gene cell_line dependency entrez_id gene_name depma…¹
## <chr> <chr> <dbl> <int> <chr> <chr>
## 1 A1BG (1) 127399_SOFT_TISSUE NA 1 A1BG ACH-00…
## 2 NAT2 (10) 127399_SOFT_TISSUE NA 10 NAT2 ACH-00…
## 3 ADA (100) 127399_SOFT_TISSUE NA 100 ADA ACH-00…
## 4 CDH2 (1000) 127399_SOFT_TISSUE -0.195 1000 CDH2 ACH-00…
## 5 AKT3 (10000) 127399_SOFT_TISSUE -0.256 10000 AKT3 ACH-00…
## 6 MED6 (10001) 127399_SOFT_TISSUE -0.174 10001 MED6 ACH-00…
## 7 NR2E3 (10002) 127399_SOFT_TISSUE -0.140 10002 NR2E3 ACH-00…
## 8 NAALAD2 (10003) 127399_SOFT_TISSUE NA 10003 NAALAD2 ACH-00…
## 9 DUXB (100033411) 127399_SOFT_TISSUE NA 100033411 DUXB ACH-00…
## 10 PDZK1P1 (100034743) 127399_SOFT_TISSUE NA 100034743 PDZK1P1 ACH-00…
## # … with 12,323,998 more rows, and abbreviated variable name ¹depmap_id
## see ?depmap and browseVignettes('depmap') for documentation
## loading from cache
The crispr
dataset contains the (batch corrected CERES inferred gene effect)
CRISPR-Cas9 knockout data of select genes and cancer cell lines. This data
corresponds to the gene_effect_corrected.csv
file from the 22Q2
depmap release. Data from this dataset includes 17634
genes, 558 cell lines, 26 primary diseases, 28
lineages.
Specific crispr
datasets can be accessed, such as crispr_19Q1
by EH number.
crispr <- eh[["EH2261"]]
crispr
## # A tibble: 9,839,772 × 6
## depmap_id cell_line gene gene_…¹ entre…² depen…³
## <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 ACH-000004 HEL_HAEMATOPOIETIC_AND_LYMPHOID_TIS… A1BG… A1BG 1 0.135
## 2 ACH-000005 HEL9217_HAEMATOPOIETIC_AND_LYMPHOID… A1BG… A1BG 1 -0.212
## 3 ACH-000007 LS513_LARGE_INTESTINE A1BG… A1BG 1 0.0433
## 4 ACH-000009 C2BBE1_LARGE_INTESTINE A1BG… A1BG 1 0.0705
## 5 ACH-000011 253J_URINARY_TRACT A1BG… A1BG 1 0.191
## 6 ACH-000012 HCC827_LUNG A1BG… A1BG 1 -0.0104
## 7 ACH-000013 ONCODG1_OVARY A1BG… A1BG 1 0.0210
## 8 ACH-000014 HS294T_SKIN A1BG… A1BG 1 0.113
## 9 ACH-000015 NCIH1581_LUNG A1BG… A1BG 1 -0.0742
## 10 ACH-000017 SKBR3_BREAST A1BG… A1BG 1 0.133
## # … with 9,839,762 more rows, and abbreviated variable names ¹gene_name,
## # ²entrez_id, ³dependency
The most recent crispr
dataset can be automatically loaded into R by using the
depmap_crispr
function.
depmap::depmap_crispr()
## snapshotDate(): 2022-10-24
## see ?depmap and browseVignettes('depmap') for documentation
## loading from cache
## # A tibble: 18,881,196 × 6
## depmap_id gene dependency entrez_id gene_name cell_line
## <chr> <chr> <dbl> <int> <chr> <chr>
## 1 ACH-000001 A1BG (1) -0.135 1 A1BG NIHOVCAR3_OVARY
## 2 ACH-000004 A1BG (1) 0.0819 1 A1BG HEL_HAEMATOPOIETIC_AND_LY…
## 3 ACH-000005 A1BG (1) -0.0942 1 A1BG HEL9217_HAEMATOPOIETIC_AN…
## 4 ACH-000007 A1BG (1) -0.0115 1 A1BG LS513_LARGE_INTESTINE
## 5 ACH-000009 A1BG (1) -0.0508 1 A1BG C2BBE1_LARGE_INTESTINE
## 6 ACH-000011 A1BG (1) 0.0918 1 A1BG 253J_URINARY_TRACT
## 7 ACH-000012 A1BG (1) -0.147 1 A1BG HCC827_LUNG
## 8 ACH-000013 A1BG (1) -0.0592 1 A1BG ONCODG1_OVARY
## 9 ACH-000014 A1BG (1) -0.0348 1 A1BG HS294T_SKIN
## 10 ACH-000015 A1BG (1) -0.204 1 A1BG NCIH1581_LUNG
## # … with 18,881,186 more rows
## see ?depmap and browseVignettes('depmap') for documentation
## loading from cache
The copyNumber
dataset contains the WES copy number data, relating to the
numerical log-fold copy number change measured against the baseline copy number
of select genes and cell lines. This dataset corresponds to the
public_19Q1_gene_cn.csv
from the 22Q2 depmap release.
This dataset includes 23299 genes,
1604 cell lines, 38 primary diseases and 33
lineages.
Specific copyNumber
datasets can be accessed, such as copyNumber_19Q1
by EH
number.
copyNumber <- eh[["EH2262"]]
copyNumber
## # A tibble: 37,371,596 × 6
## depmap_id cell_line gene gene_…¹ entre…² log_co…³
## <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 ACH-000011 253J_URINARY_TRACT A1BG… A1BG 1 0.131
## 2 ACH-000026 253JBV_URINARY_TRACT A1BG… A1BG 1 -0.237
## 3 ACH-000086 ACCMESO1_PLEURA A1BG… A1BG 1 0.134
## 4 ACH-000557 AML193_HAEMATOPOIETIC_AND_LYMPHOID… A1BG… A1BG 1 -0.0208
## 5 ACH-000838 AMO1_HAEMATOPOIETIC_AND_LYMPHOID_T… A1BG… A1BG 1 0.170
## 6 ACH-000080 BDCM_HAEMATOPOIETIC_AND_LYMPHOID_T… A1BG… A1BG 1 0.00703
## 7 ACH-000992 BICR18_UPPER_AERODIGESTIVE_TRACT A1BG… A1BG 1 -0.376
## 8 ACH-000228 BICR31_UPPER_AERODIGESTIVE_TRACT A1BG… A1BG 1 1.16
## 9 ACH-000771 BICR56_UPPER_AERODIGESTIVE_TRACT A1BG… A1BG 1 0.0197
## 10 ACH-000415 BICR6_UPPER_AERODIGESTIVE_TRACT A1BG… A1BG 1 0.280
## # … with 37,371,586 more rows, and abbreviated variable names ¹gene_name,
## # ²entrez_id, ³log_copy_number
The most recent copyNumber
dataset can be automatically loaded into R by using
the depmap_copyNumber
function.
depmap::depmap_copyNumber()
## snapshotDate(): 2022-10-24
## see ?depmap and browseVignettes('depmap') for documentation
## loading from cache
## # A tibble: 44,799,888 × 6
## depmap_id gene log_copy_number entrez_id gene_name cell_line
## <chr> <chr> <dbl> <int> <chr> <chr>
## 1 ACH-000267 DDX11L1 (84771) 1.15 84771 DDX11L1 HDLM2_HAEMATO…
## 2 ACH-001408 DDX11L1 (84771) 1.04 84771 DDX11L1 UMUC14_URINAR…
## 3 ACH-000617 DDX11L1 (84771) 0.762 84771 DDX11L1 OVCAR4_OVARY
## 4 ACH-002123 DDX11L1 (84771) 1.14 84771 DDX11L1 H2369_PLEURA
## 5 ACH-000519 DDX11L1 (84771) 1.01 84771 DDX11L1 PEER_HAEMATOP…
## 6 ACH-000750 DDX11L1 (84771) 0.711 84771 DDX11L1 LOXIMVI_SKIN
## 7 ACH-000544 DDX11L1 (84771) 0.981 84771 DDX11L1 OE21_OESOPHAG…
## 8 ACH-001214 DDX11L1 (84771) 1.05 84771 DDX11L1 U138MG_CENTRA…
## 9 ACH-002223 DDX11L1 (84771) 0.630 84771 DDX11L1 D245MG_CENTRA…
## 10 ACH-000713 DDX11L1 (84771) 0.823 84771 DDX11L1 CAOV3_OVARY
## # … with 44,799,878 more rows
## see ?depmap and browseVignettes('depmap') for documentation
## loading from cache
The RPPA
dataset contains the CCLE Reverse Phase Protein Array (RPPA) data
which corresponds to the CCLE_RPPA_20180123.csv
file from the 22Q2
depmap release. This dataset includes 214 genes, 899
cell lines, 28 primary diseases, 28 lineages.
Specific RPPA
datasets can be accessed, such as RPPA_19Q1
by EH number.
RPPA <- eh[["EH2263"]]
RPPA
## # A tibble: 192,386 × 4
## depmap_id cell_line antibody expression
## <chr> <chr> <chr> <dbl>
## 1 ACH-000698 DMS53_LUNG 14-3-3_beta -0.105
## 2 ACH-000489 SW1116_LARGE_INTESTINE 14-3-3_beta 0.359
## 3 ACH-000431 NCIH1694_LUNG 14-3-3_beta 0.0287
## 4 ACH-000707 P3HR1_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE 14-3-3_beta 0.120
## 5 ACH-000509 HUT78_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE 14-3-3_beta -0.269
## 6 ACH-000522 UMUC3_URINARY_TRACT 14-3-3_beta -0.171
## 7 ACH-000613 HOS_BONE 14-3-3_beta -0.0253
## 8 ACH-000829 HUNS1_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE 14-3-3_beta -0.170
## 9 ACH-000557 AML193_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE 14-3-3_beta 0.0819
## 10 ACH-000614 RVH421_SKIN 14-3-3_beta 0.222
## # … with 192,376 more rows
The most recent RPPA
dataset can be automatically loaded into R by using the
depmap_RPPA
function.
depmap::depmap_RPPA()
## snapshotDate(): 2022-10-24
## see ?depmap and browseVignettes('depmap') for documentation
## loading from cache
## # A tibble: 192,386 × 4
## cell_line antibody expression depmap_id
## <chr> <chr> <dbl> <chr>
## 1 DMS53_LUNG 14-3-3_beta -0.105 ACH-000698
## 2 SW1116_LARGE_INTESTINE 14-3-3_beta 0.359 ACH-000489
## 3 NCIH1694_LUNG 14-3-3_beta 0.0287 ACH-000431
## 4 P3HR1_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE 14-3-3_beta 0.120 ACH-000707
## 5 HUT78_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE 14-3-3_beta -0.269 ACH-000509
## 6 UMUC3_URINARY_TRACT 14-3-3_beta -0.171 ACH-000522
## 7 HOS_BONE 14-3-3_beta -0.0253 ACH-000613
## 8 HUNS1_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE 14-3-3_beta -0.170 ACH-000829
## 9 AML193_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE 14-3-3_beta 0.0819 ACH-000557
## 10 RVH421_SKIN 14-3-3_beta 0.222 ACH-000614
## # … with 192,376 more rows
## see ?depmap and browseVignettes('depmap') for documentation
## loading from cache
The TPM
dataset contains the CCLE RNAseq gene expression data. This shows
expression data only for protein coding genes (using scale log2(TPM+1)). This
data corresponds to the CCLE_depMap_19Q1_TPM.csv
file from the 22Q2
depmap release. This dataset includes 55825 genes,
1165 cell lines, 33 primary Diseases, 32 lineages.
Specific TPM
datasets can be accessed, such as TPM_19Q1
by EH number.
TPM <- eh[["EH2264"]]
TPM
## # A tibble: 67,360,300 × 6
## depmap_id cell_line gene gene_…¹ ensem…² expre…³
## <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 ACH-000956 22RV1_PROSTATE TSPA… TSPAN6 ENSG00… 2.65
## 2 ACH-000948 2313287_STOMACH TSPA… TSPAN6 ENSG00… 3.00
## 3 ACH-000026 253JBV_URINARY_TRACT TSPA… TSPAN6 ENSG00… 4.57
## 4 ACH-000011 253J_URINARY_TRACT TSPA… TSPAN6 ENSG00… 4.58
## 5 ACH-000323 42MGBA_CENTRAL_NERVOUS_SYSTEM TSPA… TSPAN6 ENSG00… 4.59
## 6 ACH-000905 5637_URINARY_TRACT TSPA… TSPAN6 ENSG00… 5.88
## 7 ACH-000520 59M_OVARY TSPA… TSPAN6 ENSG00… 4.11
## 8 ACH-000973 639V_URINARY_TRACT TSPA… TSPAN6 ENSG00… 5.05
## 9 ACH-000896 647V_URINARY_TRACT TSPA… TSPAN6 ENSG00… 5.94
## 10 ACH-000070 697_HAEMATOPOIETIC_AND_LYMPHOID_TIS… TSPA… TSPAN6 ENSG00… 0.151
## # … with 67,360,290 more rows, and abbreviated variable names ¹gene_name,
## # ²ensembl_id, ³expression
The TPM
dataset can also be accessed by using the depmap_TPM
function.
depmap::depmap_TPM()
## snapshotDate(): 2022-10-24
## see ?depmap and browseVignettes('depmap') for documentation
## loading from cache
## # A tibble: 27,024,726 × 6
## depmap_id gene rna_expression entrez_id gene_name cell_line
## <chr> <chr> <dbl> <int> <chr> <chr>
## 1 ACH-001113 TSPAN6 (7105) 4.33 7105 TSPAN6 LC1SQSF_LUNG
## 2 ACH-001289 TSPAN6 (7105) 4.57 7105 TSPAN6 COGAR359_SOFT_TI…
## 3 ACH-001339 TSPAN6 (7105) 3.15 7105 TSPAN6 COLO794_SKIN
## 4 ACH-001538 TSPAN6 (7105) 5.09 7105 TSPAN6 KKU213_BILIARY_T…
## 5 ACH-000242 TSPAN6 (7105) 6.73 7105 TSPAN6 RT4_URINARY_TRACT
## 6 ACH-000708 TSPAN6 (7105) 4.27 7105 TSPAN6 SNU283_LARGE_INT…
## 7 ACH-000327 TSPAN6 (7105) 3.34 7105 TSPAN6 NCIH1395_LUNG
## 8 ACH-000233 TSPAN6 (7105) 0.0566 7105 TSPAN6 DEL_HAEMATOPOIET…
## 9 ACH-000461 TSPAN6 (7105) 4.02 7105 TSPAN6 SNU1196_BILIARY_…
## 10 ACH-000705 TSPAN6 (7105) 4.41 7105 TSPAN6 LC1F_LUNG
## # … with 27,024,716 more rows
## see ?depmap and browseVignettes('depmap') for documentation
## loading from cache
The metadata
dataset contains the metadata about all of the cancer cell lines.
It corresponds to the depmap_19Q1_cell_lines.csv
file found in the 22Q2
depmap release. This dataset includes 0 genes, 1676
cell lines, 38 primary diseases and 33 lineages.
Specific metadata
datasets can be accessed, such as metadata_19Q1
by EH
number.
metadata <- eh[["EH2266"]]
metadata
## # A tibble: 1,676 × 9
## depmap_id cell_line aliases cosmi…¹ sange…² prima…³ subty…⁴ gender source
## <chr> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr> <chr>
## 1 ACH-000001 NIHOVCAR3_O… NIH:OV… 905933 2201 Ovaria… Adenoc… Female ATCC
## 2 ACH-000002 HL60_HAEMAT… HL-60 905938 55 Leukem… Acute … Female ATCC
## 3 ACH-000003 CACO2_LARGE… CACO2;… NA NA Colon/… Colon … -1 <NA>
## 4 ACH-000004 HEL_HAEMATO… HEL 907053 783 Leukem… Acute … Male DSMZ
## 5 ACH-000005 HEL9217_HAE… HEL 92… NA NA Leukem… Acute … Male ATCC
## 6 ACH-000006 MONOMAC6_HA… MONO-M… 908148 2167 Leukem… Acute … Male DSMZ
## 7 ACH-000007 LS513_LARGE… LS513 907795 569 Colon/… Colon … Male ATCC
## 8 ACH-000009 C2BBE1_LARG… C2BBe1 910700 2104 Colon/… Colon … Male ATCC
## 9 ACH-000010 NCIH2077_LU… NCI-H2… NA NA Lung C… Non-Sm… <NA> <NA>
## 10 ACH-000011 253J_URINAR… 253J NA NA Bladde… Carcin… <NA> KCLB
## # … with 1,666 more rows, and abbreviated variable names ¹cosmic_id,
## # ²sanger_id, ³primary_disease, ⁴subtype_disease
The most recent metadata
dataset can be automatically loaded into R by using
the depmap_metadata
function.
depmap::depmap_metadata()
## snapshotDate(): 2022-10-24
## see ?depmap and browseVignettes('depmap') for documentation
## loading from cache
## # A tibble: 1,840 × 29
## depmap_id cell_…¹ strip…² cell_…³ aliases cosmi…⁴ sex source RRID WTSI_…⁵
## <chr> <chr> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <dbl>
## 1 ACH-000016 SLR 21 SLR21 SLR21_… <NA> NA <NA> Acade… CVCL… NA
## 2 ACH-000032 MHH-CA… MHHCAL… MHHCAL… <NA> NA Fema… DSMZ CVCL… NA
## 3 ACH-000033 NCI-H1… NCIH18… NCIH18… <NA> NA Fema… Acade… CVCL… NA
## 4 ACH-000043 Hs 895… HS895T HS895T… <NA> NA Fema… ATCC CVCL… NA
## 5 ACH-000049 HEK TE HEKTE HEKTE_… <NA> NA <NA> Acade… CVCL… NA
## 6 ACH-000051 TE 617… TE617T TE617T… <NA> NA Fema… ATCC CVCL… NA
## 7 ACH-000064 SALE SALE SALE_L… <NA> NA Male Acade… CVCL… NA
## 8 ACH-000068 REC-1 REC1 REC1_H… <NA> NA Male DSMZ CVCL… NA
## 9 ACH-000071 <NA> HS706T HS706T… <NA> NA Fema… ATCC CVCL… NA
## 10 ACH-000076 NCO2 NCO2 NCO2_H… <NA> NA Fema… HSRRB CVCL… NA
## # … with 1,830 more rows, 19 more variables: sample_collection_site <chr>,
## # primary_or_metastasis <chr>, primary_disease <chr>, subtype_disease <chr>,
## # age <chr>, sanger_id <chr>, additional_info <chr>, lineage <chr>,
## # lineage_subtype <chr>, lineage_sub_subtype <chr>,
## # lineage_molecular_subtype <chr>, default_growth_pattern <chr>,
## # model_manipulation <chr>, model_manipulation_details <chr>,
## # patient_id <chr>, parent_depmap_id <chr>, Cellosaurus_NCIt_disease <chr>, …
## see ?depmap and browseVignettes('depmap') for documentation
## loading from cache
The mutationCalls
dataset contains all merged mutation calls (coding region,
germline filtered) found in the depmap dependency study. This dataset
corresponds with the depmap_19Q1_mutation_calls.csv
file found in the
22Q2 depmap release and includes
19350 genes,
1601 cell lines, 37 primary diseases and
33 lineages.
Specific mutationCalls
datasets can be accessed, such as mutationCalls_19Q1
by EH number.
mutationCalls <- eh[["EH2265"]]
mutationCalls
## # A tibble: 1,243,145 × 35
## depmap_id gene_name entrez_id ncbi_…¹ chrom…² start…³ end_pos strand var_c…⁴
## <chr> <chr> <dbl> <dbl> <chr> <dbl> <dbl> <chr> <chr>
## 1 ACH-000001 VPS13D 55187 37 1 1.24e7 1.24e7 + Nonsen…
## 2 ACH-000001 AADACL4 343066 37 1 1.27e7 1.27e7 + In_Fra…
## 3 ACH-000001 IFNLR1 163702 37 1 2.45e7 2.45e7 + Silent
## 4 ACH-000001 TMEM57 55219 37 1 2.58e7 2.58e7 + Frame_…
## 5 ACH-000001 ZSCAN20 7579 37 1 3.40e7 3.40e7 + Missen…
## 6 ACH-000001 POU3F1 5453 37 1 3.85e7 3.85e7 + Missen…
## 7 ACH-000001 MAST2 23139 37 1 4.65e7 4.65e7 + Silent
## 8 ACH-000001 GBP4 115361 37 1 8.97e7 8.97e7 + Silent
## 9 ACH-000001 VAV3 10451 37 1 1.08e8 1.08e8 + Splice…
## 10 ACH-000001 NBPF20 100288142 37 1 1.48e8 1.48e8 + Missen…
## # … with 1,243,135 more rows, 26 more variables: var_type <chr>,
## # ref_allele <chr>, tumor_seq_allele1 <chr>, dbSNP_RS <chr>,
## # dbSNP_val_status <chr>, genome_change <chr>, annotation_transcript <chr>,
## # tumor_sample_barcode <chr>, cDNA_change <chr>, codon_change <chr>,
## # protein_change <chr>, is_deleterious <lgl>, is_tcga_hotspot <lgl>,
## # tcga_hsCnt <dbl>, is_cosmic_hotspot <lgl>, cosmic_hsCnt <dbl>,
## # ExAC_AF <dbl>, VA_WES_AC <chr>, CGA_WES_AC <chr>, sanger_WES_AC <chr>, …
The most recent mutationCalls
dataset can be automatically loaded into R by
using the depmap_mutationCalls
function.
depmap::depmap_mutationCalls()
## snapshotDate(): 2022-10-24
## see ?depmap and browseVignettes('depmap') for documentation
## loading from cache
## # A tibble: 1,235,466 × 32
## depmap_id gene_name entrez_id ncbi_…¹ chrom…² start…³ end_pos strand var_c…⁴
## <chr> <chr> <dbl> <dbl> <chr> <dbl> <dbl> <chr> <chr>
## 1 ACH-000001 VPS13D 55187 37 1 1.24e7 1.24e7 + Nonsen…
## 2 ACH-000001 AADACL4 343066 37 1 1.27e7 1.27e7 + In_Fra…
## 3 ACH-000001 IFNLR1 163702 37 1 2.45e7 2.45e7 + Silent
## 4 ACH-000001 TMEM57 55219 37 1 2.58e7 2.58e7 + Frame_…
## 5 ACH-000001 ZSCAN20 7579 37 1 3.40e7 3.40e7 + Missen…
## 6 ACH-000001 POU3F1 5453 37 1 3.85e7 3.85e7 + Missen…
## 7 ACH-000001 MAST2 23139 37 1 4.65e7 4.65e7 + Silent
## 8 ACH-000001 GBP4 115361 37 1 8.97e7 8.97e7 + Silent
## 9 ACH-000001 VAV3 10451 37 1 1.08e8 1.08e8 + Splice…
## 10 ACH-000001 NBPF20 100288142 37 1 1.48e8 1.48e8 + Missen…
## # … with 1,235,456 more rows, 23 more variables: var_type <chr>,
## # ref_allele <chr>, alt_allele <chr>, dbSNP_RS <chr>, dbSNP_val_status <chr>,
## # genome_change <chr>, annotation_trans <chr>, cDNA_change <chr>,
## # codon_change <chr>, protein_change <chr>, is_deleterious <lgl>,
## # is_tcga_hotspot <lgl>, tcga_hsCnt <dbl>, is_cosmic_hotspot <lgl>,
## # cosmic_hsCnt <dbl>, ExAC_AF <dbl>, var_annotation <chr>, CGA_WES_AC <chr>,
## # HC_AC <chr>, RD_AC <chr>, RNAseq_AC <chr>, sanger_WES_AC <chr>, …
## see ?depmap and browseVignettes('depmap') for documentation
## loading from cache
The drug_sensitivity
dataset contains dependency data for cancer cell lines
treated with 4686 compounds. This
dataset corresponds with the primary_replicate_collapsed_logfold_change.csv
file found in the 22Q2 depmap release and includes
578 cell lines, 23 primary diseases
and 25 lineages.
Specific drug_sensitivity
datasets can be accessed, such as
drug_sensitivity_19Q3
by EH number.
drug_sensitivity <- eh[["EH3087"]]
drug_sensitivity
## # A tibble: 2,708,508 × 4
## depmap_id cell_line compound dependency
## <chr> <chr> <chr> <dbl>
## 1 ACH-000001 NIHOVCAR3_OVARY BRD-A00077618-236-07-6::2.5::HTS -0.0156
## 2 ACH-000007 LS513_LARGE_INTESTINE BRD-A00077618-236-07-6::2.5::HTS -0.0957
## 3 ACH-000008 A101D_SKIN BRD-A00077618-236-07-6::2.5::HTS 0.379
## 4 ACH-000010 NCIH2077_LUNG BRD-A00077618-236-07-6::2.5::HTS 0.119
## 5 ACH-000011 253J_URINARY_TRACT BRD-A00077618-236-07-6::2.5::HTS 0.145
## 6 ACH-000012 HCC827_LUNG BRD-A00077618-236-07-6::2.5::HTS 0.103
## 7 ACH-000013 ONCODG1_OVARY BRD-A00077618-236-07-6::2.5::HTS 0.353
## 8 ACH-000014 HS294T_SKIN BRD-A00077618-236-07-6::2.5::HTS 0.128
## 9 ACH-000015 NCIH1581_LUNG BRD-A00077618-236-07-6::2.5::HTS 0.167
## 10 ACH-000018 T24_URINARY_TRACT BRD-A00077618-236-07-6::2.5::HTS 0.832
## # … with 2,708,498 more rows
The most recent drug_sensitivity
dataset can be automatically loaded into R by
using the depmap_drug_sensitivity
function.
depmap::depmap_drug_sensitivity()
## snapshotDate(): 2022-10-24
## see ?depmap and browseVignettes('depmap') for documentation
## loading from cache
## # A tibble: 2,708,508 × 14
## depmap_id cell_line compo…¹ depen…² broad…³ name dose scree…⁴ moa target
## <chr> <chr> <chr> <dbl> <chr> <chr> <dbl> <chr> <chr> <chr>
## 1 ACH-000001 NIHOVCAR… BRD-A0… -0.0156 BRD-A0… 8-br… 2.5 HTS PKA … PRKG1
## 2 ACH-000007 LS513_LA… BRD-A0… -0.0957 BRD-A0… 8-br… 2.5 HTS PKA … PRKG1
## 3 ACH-000008 A101D_SK… BRD-A0… 0.379 BRD-A0… 8-br… 2.5 HTS PKA … PRKG1
## 4 ACH-000010 NCIH2077… BRD-A0… 0.119 BRD-A0… 8-br… 2.5 HTS PKA … PRKG1
## 5 ACH-000011 253J_URI… BRD-A0… 0.145 BRD-A0… 8-br… 2.5 HTS PKA … PRKG1
## 6 ACH-000012 HCC827_L… BRD-A0… 0.103 BRD-A0… 8-br… 2.5 HTS PKA … PRKG1
## 7 ACH-000013 ONCODG1_… BRD-A0… 0.353 BRD-A0… 8-br… 2.5 HTS PKA … PRKG1
## 8 ACH-000014 HS294T_S… BRD-A0… 0.128 BRD-A0… 8-br… 2.5 HTS PKA … PRKG1
## 9 ACH-000015 NCIH1581… BRD-A0… 0.167 BRD-A0… 8-br… 2.5 HTS PKA … PRKG1
## 10 ACH-000018 T24_URIN… BRD-A0… 0.832 BRD-A0… 8-br… 2.5 HTS PKA … PRKG1
## # … with 2,708,498 more rows, 4 more variables: disease_area <chr>,
## # indication <chr>, smiles <chr>, phase <chr>, and abbreviated variable names
## # ¹compound, ²dependency, ³broad_id, ⁴screen_id
## see ?depmap and browseVignettes('depmap') for documentation
## loading from cache
The proteomic
dataset contains normalized quantitative profiling of
proteins of cancer cell lines by mass spectrometry. This dataset corresponds
with the https://gygi.med.harvard.edu/sites/gygi.med.harvard.edu/files/documents/protein_quant_current_normalized.csv.gz
file found in the 22Q2 depmap release and includes
375 cell lines, 24 primary diseases
and 27 lineages.
Specific proteomic
datasets can be accessed, such as proteomic_20Q2
by EH number.
proteomic <- eh[["EH3459"]]
proteomic
## # A tibble: 4,821,390 × 12
## depmap_id gene_name entrez_id protein protei…¹ prote…² desc group…³ uniprot
## <chr> <chr> <dbl> <chr> <dbl> <chr> <chr> <dbl> <chr>
## 1 ACH-000849 SLC12A2 6558 MDAMB4… 2.11 sp|P55… S12A… 0 S12A2_…
## 2 ACH-000441 SLC12A2 6558 SH4_SK… 0.0705 sp|P55… S12A… 0 S12A2_…
## 3 ACH-000248 SLC12A2 6558 AU565_… -0.464 sp|P55… S12A… 0 S12A2_…
## 4 ACH-000684 SLC12A2 6558 KMRC1_… -0.884 sp|P55… S12A… 0 S12A2_…
## 5 ACH-000856 SLC12A2 6558 CAL51_… 0.789 sp|P55… S12A… 0 S12A2_…
## 6 ACH-000348 SLC12A2 6558 RPMI79… -0.912 sp|P55… S12A… 0 S12A2_…
## 7 ACH-000062 SLC12A2 6558 RERFLC… 0.729 sp|P55… S12A… 0 S12A2_…
## 8 ACH-000650 SLC12A2 6558 IGR37_… -0.658 sp|P55… S12A… 0 S12A2_…
## 9 ACH-000484 SLC12A2 6558 VMRCRC… -1.15 sp|P55… S12A… 0 S12A2_…
## 10 ACH-000625 SLC12A2 6558 HEP3B2… 0.00882 sp|P55… S12A… 0 S12A2_…
## # … with 4,821,380 more rows, 3 more variables: uniprot_acc <chr>, TenPx <chr>,
## # cell_line <chr>, and abbreviated variable names ¹protein_expression,
## # ²protein_id, ³group_id
The most recent proteomic
dataset can be automatically loaded into R by
using the depmap_proteomic
function.
depmap::depmap_proteomic()
## snapshotDate(): 2022-10-24
## see ?depmap and browseVignettes('depmap') for documentation
## loading from cache
## # A tibble: 4,821,390 × 12
## depmap_id gene_name entrez_id protein protei…¹ prote…² desc group…³ uniprot
## <chr> <chr> <dbl> <chr> <dbl> <chr> <chr> <dbl> <chr>
## 1 ACH-000849 SLC12A2 6558 MDAMB4… 2.11 sp|P55… S12A… 0 S12A2_…
## 2 ACH-000441 SLC12A2 6558 SH4_SK… 0.0705 sp|P55… S12A… 0 S12A2_…
## 3 ACH-000248 SLC12A2 6558 AU565_… -0.464 sp|P55… S12A… 0 S12A2_…
## 4 ACH-000684 SLC12A2 6558 KMRC1_… -0.884 sp|P55… S12A… 0 S12A2_…
## 5 ACH-000856 SLC12A2 6558 CAL51_… 0.789 sp|P55… S12A… 0 S12A2_…
## 6 ACH-000348 SLC12A2 6558 RPMI79… -0.912 sp|P55… S12A… 0 S12A2_…
## 7 ACH-000062 SLC12A2 6558 RERFLC… 0.729 sp|P55… S12A… 0 S12A2_…
## 8 ACH-000650 SLC12A2 6558 IGR37_… -0.658 sp|P55… S12A… 0 S12A2_…
## 9 ACH-000484 SLC12A2 6558 VMRCRC… -1.15 sp|P55… S12A… 0 S12A2_…
## 10 ACH-000625 SLC12A2 6558 HEP3B2… 0.00882 sp|P55… S12A… 0 S12A2_…
## # … with 4,821,380 more rows, 3 more variables: uniprot_acc <chr>, TenPx <chr>,
## # cell_line <chr>, and abbreviated variable names ¹protein_expression,
## # ²protein_id, ³group_id
If desired, the original data from which the
depmap
package were derived from can be downloaded from the Broad Institute
website. The instructions on how to download these files and how the data was
transformed and loaded into the depmap
package can be found in the make_data.R
file found in ./inst/scripts
. (It
should be noted that the original uncompressed .csv files are >1.5GB in
total and take a moderate amount of time to download remotely.)
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.16-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.16-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ExperimentHub_2.6.0 AnnotationHub_3.6.0 BiocFileCache_2.6.0
## [4] dbplyr_2.2.1 BiocGenerics_0.44.0 depmap_1.12.0
## [7] dplyr_1.0.10 BiocStyle_2.26.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.9 png_0.1-7
## [3] Biostrings_2.66.0 assertthat_0.2.1
## [5] digest_0.6.30 utf8_1.2.2
## [7] mime_0.12 GenomeInfoDb_1.34.0
## [9] R6_2.5.1 stats4_4.2.1
## [11] RSQLite_2.2.18 evaluate_0.17
## [13] httr_1.4.4 pillar_1.8.1
## [15] zlibbioc_1.44.0 rlang_1.0.6
## [17] curl_4.3.3 jquerylib_0.1.4
## [19] blob_1.2.3 S4Vectors_0.36.0
## [21] rmarkdown_2.17 stringr_1.4.1
## [23] RCurl_1.98-1.9 bit_4.0.4
## [25] shiny_1.7.3 compiler_4.2.1
## [27] httpuv_1.6.6 xfun_0.34
## [29] pkgconfig_2.0.3 htmltools_0.5.3
## [31] tidyselect_1.2.0 KEGGREST_1.38.0
## [33] GenomeInfoDbData_1.2.9 tibble_3.1.8
## [35] interactiveDisplayBase_1.36.0 bookdown_0.29
## [37] IRanges_2.32.0 fansi_1.0.3
## [39] withr_2.5.0 crayon_1.5.2
## [41] later_1.3.0 bitops_1.0-7
## [43] rappdirs_0.3.3 jsonlite_1.8.3
## [45] xtable_1.8-4 lifecycle_1.0.3
## [47] DBI_1.1.3 magrittr_2.0.3
## [49] cli_3.4.1 stringi_1.7.8
## [51] cachem_1.0.6 XVector_0.38.0
## [53] promises_1.2.0.1 bslib_0.4.0
## [55] ellipsis_0.3.2 filelock_1.0.2
## [57] generics_0.1.3 vctrs_0.5.0
## [59] tools_4.2.1 bit64_4.0.5
## [61] Biobase_2.58.0 glue_1.6.2
## [63] purrr_0.3.5 BiocVersion_3.16.0
## [65] fastmap_1.1.0 yaml_2.3.6
## [67] AnnotationDbi_1.60.0 BiocManager_1.30.19
## [69] memoise_2.0.1 knitr_1.40
## [71] sass_0.4.2