SPIAT 1.0.4
First we load the SPIAT library.
library(SPIAT)
Here we present some quality control steps implemented in SPIAT to check for the quality of phenotyping, help detect uneven staining, and other potential technical artefacts.
In this vignette we will use an inForm data file that’s already been
formatted for SPIAT with format_image_to_spe()
, which we can load with
data()
. We will use define_celltypes()
to define the cells with certain
combinations of markers.
data("simulated_image")
# define cell types
formatted_image <- define_celltypes(
simulated_image,
categories = c("Tumour_marker","Immune_marker1,Immune_marker2",
"Immune_marker1,Immune_marker3",
"Immune_marker1,Immune_marker2,Immune_marker4", "OTHER"),
category_colname = "Phenotype",
names = c("Tumour", "Immune1", "Immune2", "Immune3", "Others"),
new_colname = "Cell.Type")
Phenotyping of cells can be verified comparing marker intensities of
cells labelled positive and negative for a marker. Cells positive for a
marker should have high levels of the marker. An unclear separation of
marker intensities between positive and negative cells would suggest
phenotypes have not been accurately assigned. We can use
marker_intensity_boxplot()
to produce a boxplot for cells phenotyped as
being positive or negative for a marker.
marker_intensity_boxplot(formatted_image, "Immune_marker1")
Note that if phenotypes were obtained from software that uses machine learning to determine positive cells, which generally also take into account properties such as cell shape, nucleus size etc., rather than a strict threshold, some negative cells will have high marker intensities, and vice versa. In general, a limited overlap of whiskers or outlier points is tolerated and expected. However, overlapping boxplots suggests unreliable phenotyping.
Uneven marker staining or high background intensity can be identified
with plot_cell_marker_levels()
. This produces a scatter plot of the
intensity of a marker in each cell. This should be relatively even
across the image and all phenotyped cells. Cells that were not
phenotyped as being positive for the particular marker are excluded.
plot_cell_marker_levels(formatted_image, "Immune_marker1")
For large images, there is also the option of ‘blurring’ the image,
where the image is split into multiple small areas, and marker
intensities are averaged within each. The image is blurred based on the
num_splits
parameter.
plot_marker_level_heatmap(formatted_image, num_splits = 100, "Tumour_marker")
We may see cells with biologically implausible combination of markers
present in the input data when using unique(spe_object$Phenotype)
.
For example, cells might be incorrectly typed as positive for two markers known
to not co-occur in a single cell type. Incorrect cell phenotypes may be present
due to low cell segmentation quality, antibody ‘bleeding’ from one cell
to another or inadequate marker thresholding.
If the number of incorrectly phenotyped cells is small (<5%), we advise simply removing these cells (see below). If it is a higher proportion, we recommend checking the cell segmentation and phenotyping methods, as a more systematic problem might be present.
If you identify incorrect phenotypes or have any properties (columns) that you
want to exclude you can use select_celltypes()
. Set celltypes
the values
that you want to keep or exclude for a column (feature_colname
). Set keep
as TRUE
to include these cells and FALSE
to exclude them.
data_subset <- select_celltypes(
formatted_image, keep=TRUE,
celltypes = c("Tumour_marker","Immune_marker1,Immune_marker3",
"Immune_marker1,Immune_marker2",
"Immune_marker1,Immune_marker2,Immune_marker4"),
feature_colname = "Phenotype")
# have a look at what phenotypes are present
unique(data_subset$Phenotype)
## [1] "Immune_marker1,Immune_marker2"
## [2] "Tumour_marker"
## [3] "Immune_marker1,Immune_marker2,Immune_marker4"
## [4] "Immune_marker1,Immune_marker3"
In this vignette we will work with all the original phenotypes present
in formatted_image
.
We can also check for specific misclassified cells using dimensionality reduction. SPIAT offers tSNE and UMAPs based on marker intensities to visualise cells. Cells of distinct types should be forming clearly different clusters.
The generated dimensionality reduction plots are interactive, and users can hover over each cell and obtain the cell ID. Users can then remove specific misclassified cells.
# First predict the phenotypes (this is for generating not 100% accurate phenotypes)
predicted_image2 <- predict_phenotypes(spe_object = simulated_image,
thresholds = NULL,
tumour_marker = "Tumour_marker",
baseline_markers = c("Immune_marker1",
"Immune_marker2",
"Immune_marker3",
"Immune_marker4"),
reference_phenotypes = FALSE)
## [1] "Tumour_marker threshold intensity: 0.445450443784465"
## [1] "Immune_marker1 threshold intensity: 0.116980867970434"
## [1] "Immune_marker2 threshold intensity: 0.124283809517202"
## [1] "Immune_marker3 threshold intensity: 0.0166413130263845"
## [1] "Immune_marker4 threshold intensity: 0.00989731350898589"
# Then define the cell types based on predicted phenotypes
predicted_image2 <- define_celltypes(
predicted_image2,
categories = c("Tumour_marker", "Immune_marker1,Immune_marker2",
"Immune_marker1,Immune_marker3",
"Immune_marker1,Immune_marker2,Immune_marker4"),
category_colname = "Phenotype",
names = c("Tumour", "Immune1", "Immune2", "Immune3"),
new_colname = "Cell.Type")
# Delete cells with unrealistic marker combinations from the dataset
predicted_image2 <-
select_celltypes(predicted_image2, "Undefined", feature_colname = "Cell.Type",
keep = FALSE)
# TSNE plot
g <- dimensionality_reduction_plot(predicted_image2, plot_type = "TSNE",
feature_colname = "Cell.Type")
Note that dimensionality_reduction_plot()
only prints a static version
of the UMAP or tSNE plot. If the user wants to interact with this plot,
they can pass the result to the ggplotly()
function from the plotly
package. Due to the file size restriction, we only show a screenshot of
the interactive tSNE plot.
plotly::ggplotly(g)
The plot shows that there are four clear clusters based on marker intensities. This is consistent with the cell definition from the marker combinations from the “Phenotype” column. (The interactive t-SNE plot would allow users to hover the cursor on the misclassified cells and see their cell IDs.) In this example, Cell_3302, Cell_4917, Cell_2297, Cell_488, Cell_4362, Cell_4801, Cell_2220, Cell_3431, Cell_533, Cell_4925, Cell_4719, Cell_469, Cell_1929, Cell_310, Cell_2536, Cell_321, and Cell_4195 are obviously misclassified according to this plot.
We can use select_celltypes()
to delete the misclassified cells.
predicted_image2 <-
select_celltypes(predicted_image2, c("Cell_3302", "Cell_4917", "Cell_2297",
"Cell_488", "Cell_4362", "Cell_4801",
"Cell_2220", "Cell_3431", "Cell_533",
"Cell_4925", "Cell_4719", "Cell_469",
"Cell_1929", "Cell_310", "Cell_2536",
"Cell_321", "Cell_4195"),
feature_colname = "Cell.ID", keep = FALSE)
Then plot the TSNE again (not interactive). This time we see there are fewer misclassified cells.
# TSNE plot
g <- dimensionality_reduction_plot(predicted_image2, plot_type = "TSNE", feature_colname = "Cell.Type")
# plotly::ggplotly(g) # uncomment this code to interact with the plot
In addition to the marker level tissue plots for QC, SPIAT has other methods for visualising markers and phenotypes in tissues.
We can see the location of all cell types (or any column in the data) in
the tissue with plot_cell_categories()
. Each dot in the plot corresponds
to a cell and cells are coloured by cell type. Any cell types present in
the data but not in the cell types of interest will be put in the
category “OTHER” and coloured lightgrey.
my_colors <- c("red", "blue", "darkcyan", "darkgreen")
plot_cell_categories(spe_object = formatted_image,
categories_of_interest =
c("Tumour", "Immune1", "Immune2", "Immune3"),
colour_vector = my_colors, feature_colname = "Cell.Type")
plot_cell_categories()
also allows the users to plot the categories layer by
layer when there are too many cells by setting layered
parameter as TRUE
.
Then the cells will be plotted in the order of categories_of_interest
layer
by layer. Users can also use cex
parameter to change the size of the points.
We can visualise a selected marker in 3D with marker_surface_plot()
. The
image is blurred based on the num_splits
parameter.
marker_surface_plot(formatted_image, num_splits=15, marker="Immune_marker1")
Due to the restriction of the file size, we have disabled the interactive plot in this vignette. Here only shows a screen shot. (You can interactively move the plot around to obtain a better view with the same code).
To visualise multiple markers in 3D in a single plot we can use
marker_surface_plot_stack()
. This shows normalised intensity level of
specified markers and enables the identification of co-occurring and
mutually exclusive markers.
marker_surface_plot_stack(formatted_image, num_splits=15, markers=c("Tumour_marker", "Immune_marker1"))
The stacked surface plots of the Tumour_marker and Immune_marker1 cells in this example shows how Tumour_marker and Immune_marker1 are mutually exclusive as the peaks and valleys are opposite. Similar to the previous plot, we have disabled the interactive plot in the vignette. (You can interactively move the plot around to obtain a better view with the same code.)
sessionInfo()
## R version 4.2.2 (2022-10-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.16-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.16-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] SPIAT_1.0.4 SpatialExperiment_1.8.0
## [3] SingleCellExperiment_1.20.0 SummarizedExperiment_1.28.0
## [5] Biobase_2.58.0 GenomicRanges_1.50.2
## [7] GenomeInfoDb_1.34.4 IRanges_2.32.0
## [9] S4Vectors_0.36.1 BiocGenerics_0.44.0
## [11] MatrixGenerics_1.10.0 matrixStats_0.63.0
## [13] BiocStyle_2.26.0
##
## loaded via a namespace (and not attached):
## [1] circlize_0.4.15 plyr_1.8.8
## [3] lazyeval_0.2.2 crosstalk_1.2.0
## [5] BiocParallel_1.32.4 ggplot2_3.4.0
## [7] digest_0.6.31 foreach_1.5.2
## [9] htmltools_0.5.4 magick_2.7.3
## [11] fansi_1.0.3 magrittr_2.0.3
## [13] tensor_1.5 cluster_2.1.4
## [15] doParallel_1.0.17 tzdb_0.3.0
## [17] limma_3.54.0 ComplexHeatmap_2.14.0
## [19] R.utils_2.12.2 vroom_1.6.0
## [21] spatstat.sparse_3.0-0 colorspace_2.0-3
## [23] ggrepel_0.9.2 xfun_0.35
## [25] dplyr_1.0.10 crayon_1.5.2
## [27] RCurl_1.98-1.9 jsonlite_1.8.4
## [29] spatstat.data_3.0-0 iterators_1.0.14
## [31] glue_1.6.2 polyclip_1.10-4
## [33] gtable_0.3.1 zlibbioc_1.44.0
## [35] XVector_0.38.0 GetoptLong_1.0.5
## [37] DelayedArray_0.24.0 DropletUtils_1.18.1
## [39] Rhdf5lib_1.20.0 shape_1.4.6
## [41] HDF5Array_1.26.0 apcluster_1.4.10
## [43] abind_1.4-5 scales_1.2.1
## [45] pheatmap_1.0.12 DBI_1.1.3
## [47] edgeR_3.40.1 spatstat.random_3.0-1
## [49] Rcpp_1.0.9 viridisLite_0.4.1
## [51] clue_0.3-63 archive_1.1.5
## [53] dqrng_0.3.0 bit_4.0.5
## [55] httr_1.4.4 htmlwidgets_1.6.0
## [57] dittoSeq_1.10.0 RColorBrewer_1.1-3
## [59] pkgconfig_2.0.3 R.methodsS3_1.8.2
## [61] farver_2.1.1 scuttle_1.8.3
## [63] sass_0.4.4 deldir_1.0-6
## [65] locfit_1.5-9.6 utf8_1.2.2
## [67] tidyselect_1.2.0 labeling_0.4.2
## [69] rlang_1.0.6 reshape2_1.4.4
## [71] munsell_0.5.0 tools_4.2.2
## [73] cachem_1.0.6 cli_3.5.0
## [75] dbscan_1.1-11 generics_0.1.3
## [77] mmand_1.6.2 ggridges_0.5.4
## [79] evaluate_0.19 stringr_1.5.0
## [81] fastmap_1.1.0 yaml_2.3.6
## [83] goftest_1.2-3 knitr_1.41
## [85] bit64_4.0.5 purrr_1.0.0
## [87] RANN_2.6.1 nlme_3.1-161
## [89] sparseMatrixStats_1.10.0 R.oo_1.25.0
## [91] pracma_2.4.2 compiler_4.2.2
## [93] plotly_4.10.1 png_0.1-8
## [95] spatstat.utils_3.0-1 tibble_3.1.8
## [97] bslib_0.4.2 stringi_1.7.8
## [99] highr_0.9 lattice_0.20-45
## [101] Matrix_1.5-3 vctrs_0.5.1
## [103] pillar_1.8.1 lifecycle_1.0.3
## [105] rhdf5filters_1.10.0 BiocManager_1.30.19
## [107] spatstat.geom_3.0-3 jquerylib_0.1.4
## [109] GlobalOptions_0.1.2 data.table_1.14.6
## [111] cowplot_1.1.1 bitops_1.0-7
## [113] R6_2.5.1 bookdown_0.31
## [115] gridExtra_2.3 codetools_0.2-18
## [117] gtools_3.9.4 assertthat_0.2.1
## [119] rhdf5_2.42.0 rjson_0.2.21
## [121] withr_2.5.0 GenomeInfoDbData_1.2.9
## [123] parallel_4.2.2 grid_4.2.2
## [125] beachmat_2.14.0 tidyr_1.2.1
## [127] rmarkdown_2.19 DelayedMatrixStats_1.20.0
## [129] Cairo_1.6-0 Rtsne_0.16
## [131] spatstat.explore_3.0-5