How to install HPAStainR

Installation can be completed using BiocManager and the code below.

if(!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("HPAStainR")

Preparing data for HPAStainR

Downloading data from the website

The first step required to run HPAStainR is downloading HPA’s normal tissue staining data and their cancer data. While available online, HPAStainR has a function that can download and load the data for you.

library(HPAStainR)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
#> Loading required package: tidyr

HPA_data <- HPA_data_downloader(tissue_type = "both", save_file = FALSE)

The above function has downloaded both normal tissue and cancer data. save_file was set to FALSE, but if it were set to TRUE as there was no given argument for save location, both files being saved to the current working directory. The data has also been unzipped and loaded into the object HPA_dat as a list of data frames called hpa_dat and cancer_dat which hold the normal tissue and cancer tissue data respectively. If the code is run again it would redownload the files unless you had set save_file to TRUE, in which case it would just load said saved files.

Head of normal tissue

Gene	Gene.name	Tissue	Cell.type	Level	Reliability
ENSG00000000003	TSPAN6	adipose tissue	adipocytes	Not detected	Approved
ENSG00000000003	TSPAN6	adrenal gland	glandular cells	Not detected	Approved
ENSG00000000003	TSPAN6	appendix	glandular cells	Medium	Approved
ENSG00000000003	TSPAN6	appendix	lymphoid tissue	Not detected	Approved
ENSG00000000003	TSPAN6	bone marrow	hematopoietic cells	Not detected	Approved
ENSG00000000003	TSPAN6	breast	adipocytes	Not detected	Approved
ENSG00000000003	TSPAN6	breast	glandular cells	High	Approved
ENSG00000000003	TSPAN6	breast	myoepithelial cells	Not detected	Approved
ENSG00000000003	TSPAN6	bronchus	respiratory epithelial cells	High	Approved
ENSG00000000003	TSPAN6	caudate	glial cells	Not detected	Approved

Head of cancer tissue (columns 1-7)

Gene	Gene.name	Cancer	High	Medium	Low	Not.detected
ENSG00000000003	TSPAN6	breast cancer	1	7	2	2
ENSG00000000003	TSPAN6	carcinoid	0	1	1	2
ENSG00000000003	TSPAN6	cervical cancer	11	1	0	0
ENSG00000000003	TSPAN6	colorectal cancer	0	6	2	2
ENSG00000000003	TSPAN6	endometrial cancer	10	2	0	0
ENSG00000000003	TSPAN6	glioma	0	0	0	11
ENSG00000000003	TSPAN6	head and neck cancer	0	3	1	0
ENSG00000000003	TSPAN6	liver cancer	4	5	1	0
ENSG00000000003	TSPAN6	lung cancer	8	4	0	0
ENSG00000000003	TSPAN6	lymphoma	0	0	0	11

The `hpar` package as a version controlled alternative

HPA_data_downloader provides the most up to date information on the Human Protein Atlas website. However for more consistent results with version control, feel free to use the data from the hpar BioConductor package using the following commands.

if (!requireNamespace("hpar", quietly = TRUE))
    BiocManager::install(hpar)
data(hpaNormalTissue, package = "hpar")
data(hpaCancer, package = "hpar")

Head of `hpar`’s normal tissue

Gene	Gene.name	Tissue	Cell.type	Level	Reliability
ENSG00000000003	TSPAN6	adipose tissue	adipocytes	Not detected	Approved
ENSG00000000003	TSPAN6	adrenal gland	glandular cells	Not detected	Approved
ENSG00000000003	TSPAN6	appendix	glandular cells	Medium	Approved
ENSG00000000003	TSPAN6	appendix	lymphoid tissue	Not detected	Approved
ENSG00000000003	TSPAN6	bone marrow	hematopoietic cells	Not detected	Approved
ENSG00000000003	TSPAN6	breast	adipocytes	Not detected	Approved
ENSG00000000003	TSPAN6	breast	glandular cells	High	Approved
ENSG00000000003	TSPAN6	breast	myoepithelial cells	Not detected	Approved
ENSG00000000003	TSPAN6	bronchus	respiratory epithelial cells	High	Approved
ENSG00000000003	TSPAN6	caudate	glial cells	Not detected	Approved

Using HPAStainR

Using the HPAStainR function

Now that the data is available you can now us the HPAStainR function. This requires a list of proteins or genes you are interested in. In this example, we’re going to use pancreatic enzymes PRSS1, PNLIP, CELA3A, and the hormone PRL.

gene_list = c("PRSS1", "PNLIP","CELA3A", "PRL")

stainR_out <- HPAStainR::HPAStainR(gene_list = gene_list,
          hpa_dat = HPA_data$hpa_dat,
          cancer_dat = HPA_data$cancer_dat,
          cancer_analysis = "both",
          stringency = "normal")


head(stainR_out, 10)
#> # A tibble: 10 × 11
#>    cell_type     perce…¹ perce…² perce…³ perce…⁴ numbe…⁵ teste…⁶ detec…⁷ stain…⁸
#>    <chr>           <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <chr>   <chr>     <dbl>
#>  1 PANCREAS - e…    0.75       0       0    0.25       4 CELA3A… CELA3A…      75
#>  2 breast cancer    0          1       0    0          4 CELA3A… <NA>         50
#>  3 carcinoid        0          1       0    0          4 CELA3A… <NA>         50
#>  4 cervical can…    0          1       0    0          4 CELA3A… <NA>         50
#>  5 colorectal c…    0          1       0    0          4 CELA3A… <NA>         50
#>  6 endometrial …    0          1       0    0          4 CELA3A… <NA>         50
#>  7 glioma           0          1       0    0          4 CELA3A… <NA>         50
#>  8 liver cancer     0          1       0    0          4 CELA3A… <NA>         50
#>  9 lung cancer      0          1       0    0          4 CELA3A… <NA>         50
#> 10 lymphoma         0          1       0    0          4 CELA3A… <NA>         50
#> # … with 2 more variables: p_val <dbl>, p_val_adj <dbl>, and abbreviated
#> #   variable names ¹percent_high_expression, ²percent_medium_expression,
#> #   ³percent_low_expression, ⁴percent_not_detected, ⁵number_of_proteins,
#> #   ⁶tested_proteins, ⁷detected_proteins, ⁸staining_score

The output of HPAStainR is a tibble with multiple columns. The basic columns include the following:

cell_type: The cell types/cancers that are tested in the Human Protein Atlas.
percent/count_high/medium/low_expression: Either the percent or count of genes from the list that stain either at high levels, medium levels or low levels.
percent/count_not_detected: The number or percent of proteins that failed to stain the cell type.
number of proteins: The number of proteins tested in a cell type.
tested_proteins: A character string of proteins that were tested in the cell type as not all proteins are tested in every cell type.
detected_proteins: A character string of proteins that were detected in each cell type.
enriched_score: An arbitrary ranking value further explained below.
p_val: A p-value denoting an enrichment of rarely staining proteins (stained in <29% of the cell types, see paper [cite] for further details).
p_val_adjust: The previous p-value adjusted for multiple testing using “holm”

The staining score an arbitrary rank of staining weighted on how highly a protein stained. See the manual for the equation and further information.

Using the HPAStainR Shiny app

Another way to use HPAStainR is as a Shiny app, and the function shiny_HPAStainR allows you to run a local version of the app:

Note: If you want the tab from the online Shiny that gives you the stained : tested ratio of proteins, make sure to run the below code and insert the resulting object in the third argument (cell_type_data) of shiny_HPAStainR

hpa_summary <- HPA_summary_maker(hpa_dat = HPA_data$hpa_dat)

Run the Shiny app

shiny_HPAStainR(hpa_dat = HPA_data$hpa_dat,
                cancer_dat = HPA_data$cancer_dat,
                cell_type_data = hpa_summary)

A window should open like that below

Shiny Output You should now be able to query whatever list of proteins you like and can easily rank them on whatever column you wish. Also all of the options from the functions are modifiable on the left hand side panel.

Session Info

sessionInfo()
#> R version 4.2.1 (2022-06-23)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04.5 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.16-bioc/R/lib/libRblas.so
#> LAPACK: /home/biocbuild/bbs-3.16-bioc/R/lib/libRlapack.so
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] HPAStainR_1.8.0 tidyr_1.2.1     dplyr_1.0.10   
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.9        highr_0.9         pillar_1.8.1      bslib_0.4.0      
#>  [5] compiler_4.2.1    later_1.3.0       jquerylib_0.1.4   tools_4.2.1      
#>  [9] digest_0.6.30     jsonlite_1.8.3    evaluate_0.17     lifecycle_1.0.3  
#> [13] tibble_3.1.8      pkgconfig_2.0.3   rlang_1.0.6       shiny_1.7.3      
#> [17] cli_3.4.1         DBI_1.1.3         yaml_2.3.6        xfun_0.34        
#> [21] fastmap_1.1.0     withr_2.5.0       stringr_1.4.1     knitr_1.40       
#> [25] generics_0.1.3    vctrs_0.5.0       sass_0.4.2        tidyselect_1.2.0 
#> [29] data.table_1.14.4 glue_1.6.2        R6_2.5.1          fansi_1.0.3      
#> [33] rmarkdown_2.17    hpar_1.40.0       purrr_0.3.5       magrittr_2.0.3   
#> [37] scales_1.2.1      ellipsis_0.3.2    promises_1.2.0.1  htmltools_0.5.3  
#> [41] assertthat_0.2.1  colorspace_2.0-3  xtable_1.8-4      mime_0.12        
#> [45] httpuv_1.6.6      utf8_1.2.2        stringi_1.7.8     munsell_0.5.0    
#> [49] cachem_1.0.6

Any questions? Feel free to contact me at tnieuwe1[@]jhmi.edu

HPAStainR

Tim O. Nieuwenhuis

2022-11-01

How to install HPAStainR

Preparing data for HPAStainR

Downloading data from the website

Head of normal tissue

Head of cancer tissue (columns 1-7)

The `hpar` package as a version controlled alternative

Head of `hpar`’s normal tissue

Using HPAStainR

Using the HPAStainR function

Using the HPAStainR Shiny app

Run the Shiny app

A window should open like that below

Session Info

HPAStainR

Tim O. Nieuwenhuis

2022-11-01

How to install HPAStainR

Preparing data for HPAStainR

Downloading data from the website

Head of normal tissue

Head of cancer tissue (columns 1-7)

The hpar package as a version controlled alternative

Head of hpar’s normal tissue

Using HPAStainR

Using the HPAStainR function

Using the HPAStainR Shiny app

Run the Shiny app

A window should open like that below

Session Info

The `hpar` package as a version controlled alternative

Head of `hpar`’s normal tissue