1 Installation

This package can be installed from Bioconductor:

if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")
BiocManager::install("SpatialFeatureExperiment")

2 Class structure

2.1 Introduction

SpatialFeatureExperiment (SFE) is a new S4 class built on top of SpatialExperiment (SPE). SpatialFeatureExperiment incorporates geometries and geometry operations with the sf package. Examples of supported geometries are Visium spots represented with polygons corresponding to their size, cell or nuclei segmentation polygons, tissue boundary polygons, pathologist annotation of histological regions, and transcript spots of genes. Using sf, SpatialFeatureExperiment leverages the GEOS C++ libraries underlying sf for geometry operations, including algorithms for for determining whether geometries intersect, finding intersection geometries, buffering geometries with margins, etc. A schematic of the SFE object is shown below:

SpatialFeatureExperiment expands on SpatialExperiment by adding column, row, and annotation geometries and spatial graphs. This is explained in detail in the following paragraphs.

Figure 1: Schematics of the SFE object

Below is a list of SFE features that extend the SPE object:

  • colGeometries are sf data frames associated with the entities that correspond to columns of the gene count matrix, such as Visium spots or cells. The geometries in the sf data frames can be Visium spot centroids, Visium spot polygons, or for datasets with single cell resolution, cell or nuclei segmentations. Multiple colGeometries can be stored in the same SFE object, such as one for cell segmentation and another for nuclei segmentation. There can be non-spatial, attribute columns in a colGeometry rather than colData, because the sf class allows users to specify how attributes relate to geometries, such as “constant”, “aggregate”, and “identity”. See the agr argument of the st_sf documentation.
  • colGraphs are spatial neighborhood graphs of cells or spots. The graphs have class listw (spdep package), and the colPairs of SingleCellExperiment was not used so no conversion is necessary to use the numerous spatial dependency functions from spdep, such as those for Moran’s I, Geary’s C, Getis-Ord Gi*, LOSH, etc. Conversion is also not needed for other classical spatial statistics packages such as spatialreg and adespatial.
  • rowGeometries are similar to colGeometries, but support entities that correspond to rows of the gene count matrix, such as genes. A potential use case is to store transcript spots for each gene in smFISH or in situ sequencing based datasets.
  • rowGraphs are similar to colGraphs. A potential use case may be spatial colocalization of transcripts of different genes.
  • annotGeometries are sf data frames associated with the dataset but not directly with the gene count matrix, such as tissue boundaries, histological regions, cell or nuclei segmentation in Visium datasets, and etc. These geometries are stored in this object to facilitate plotting and using sf for operations such as to find the number of nuclei in each Visium spot and which histological regions each Visium spot intersects. Unlike colGeometries and rowGeometries, the number of rows in the sf data frames in annotGeometries is not constrained by the dimension of the gene count matrix and can be arbitrary.
  • annotGraphs are similar to colGraphs and rowGraphs, but are for entities not directly associated with the gene count matrix, such as spatial neighborhood graphs for nuclei in Visium datasets, or other objects like myofibers. These graphs are relevant to spdep analyses of attributes of these geometries such as spatial autocorrelation in morphological metrics of myofibers and nuclei. With geometry operations with sf, these attributes and results of analyses of these attributes (e.g. spatial regions defined by the attributes) may be related back to gene expression.
  • localResults are similar to reducedDims in SingleCellExperiment, but stores results from univariate and bivariate local spatial analysis results, such as from localmoran, Getis-Ord Gi*, local spatial heteroscedasticity (LOSH), and geographically weighted summary statistics. Unlike in reducedDims, for each type of results (type is the type of analysis such as Getis-Ord Gi*), each feature (e.g. gene) or pair of features for which the analysis is performed has its own results. The local spatial analyses can also be performed for attributes of colGeometries and annotGeometries in addition to gene expression and colData. Results of multivariate spatial analysis such as geographically weighted PCA (GWPCA) and MULTISPATI PCA can be stored in reducedDims.
library(SpatialFeatureExperiment)
library(SpatialExperiment)
#> Loading required package: SingleCellExperiment
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#> 
#> Attaching package: 'MatrixGenerics'
#> The following objects are masked from 'package:matrixStats':
#> 
#>     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#>     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#>     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#>     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#>     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#>     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#>     colWeightedMeans, colWeightedMedians, colWeightedSds,
#>     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#>     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#>     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#>     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#>     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#>     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#>     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#>     rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> 
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#>     as.data.frame, basename, cbind, colnames, dirname, do.call,
#>     duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
#>     lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
#>     pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table,
#>     tapply, union, unique, unsplit, which.max, which.min
#> Loading required package: S4Vectors
#> 
#> Attaching package: 'S4Vectors'
#> The following objects are masked from 'package:base':
#> 
#>     I, expand.grid, unname
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
#> Loading required package: Biobase
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
#> 
#> Attaching package: 'Biobase'
#> The following object is masked from 'package:MatrixGenerics':
#> 
#>     rowMedians
#> The following objects are masked from 'package:matrixStats':
#> 
#>     anyMissing, rowMedians
library(SFEData)
library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1; sf_use_s2() is TRUE
library(Matrix)
#> 
#> Attaching package: 'Matrix'
#> The following object is masked from 'package:S4Vectors':
#> 
#>     expand
# Example dataset
(sfe <- McKellarMuscleData(dataset = "small"))
#> snapshotDate(): 2022-10-31
#> see ?SFEData and browseVignettes('SFEData') for documentation
#> loading from cache
#> class: SpatialFeatureExperiment 
#> dim: 15123 77 
#> metadata(0):
#> assays(1): counts
#> rownames(15123): ENSMUSG00000025902 ENSMUSG00000096126 ...
#>   ENSMUSG00000064368 ENSMUSG00000064370
#> rowData names(6): Ensembl symbol ... vars cv2
#> colnames(77): AAATTACCTATCGATG AACATATCAACTGGTG ... TTCTTTGGTCGCGACG
#>   TTGATGTGTAGTCCCG
#> colData names(12): barcode col ... prop_mito in_tissue
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> spatialCoords names(2) : imageX imageY
#> imgData names(1): sample_id
#> 
#> Geometries:
#> colGeometries: spotPoly (POLYGON) 
#> annotGeometries: tissueBoundary (POLYGON), myofiber_full (GEOMETRY), myofiber_simplified (GEOMETRY), nuclei (POLYGON), nuclei_centroid (POINT) 
#> 
#> Graphs:
#> Vis5A:

2.2 Geometries

User interfaces to get or set the geometries and spatial graphs emulate those of reducedDims and row/colPairs in SingleCellExperiment. Column and row geometries also emulate reducedDims in internal implementation, while annotation geometries and spatial graphs differ.

2.2.1 Column and row

Column and row geometries can be get or set with the dimGeometries or dimGeometry function. The MARGIN argument is as in the apply function: MARGIN = 1 means row, and MARGIN = 2 means column.

dimGeometry gets or sets one particular geometry by name of index.

# Get Visium spot polygons
(spots <- dimGeometry(sfe, "spotPoly", MARGIN = 2))
#> Simple feature collection with 77 features and 1 field
#> Geometry type: POLYGON
#> Dimension:     XY
#> Bounding box:  xmin: 5000 ymin: 13000 xmax: 7000 ymax: 15000
#> CRS:           NA
#> First 10 features:
#>                                        geometry sample_id
#> AAATTACCTATCGATG POLYGON ((6472.186 13875.23...     Vis5A
#> AACATATCAACTGGTG POLYGON ((5778.291 13635.43...     Vis5A
#> AAGATTGGCGGAACGT POLYGON ((7000 13809.84, 69...     Vis5A
#> AAGGGACAGATTCTGT POLYGON ((6749.535 13874.64...     Vis5A
#> AATATCGAGGGTTCTC POLYGON ((5500.941 13636.03...     Vis5A
#> AATGATGATACGCTAT POLYGON ((6612.42 14598.82,...     Vis5A
#> AATGATGCGACTCCTG POLYGON ((5501.981 14118.62...     Vis5A
#> AATTCATAAGGGATCT POLYGON ((6889.769 14598.22...     Vis5A
#> ACGAGTACGGATGCCC POLYGON ((5084.397 13395.63...     Vis5A
#> ACGCTAGTGATACACT POLYGON ((5639.096 13394.44...     Vis5A
plot(st_geometry(spots))

# Setter
dimGeometry(sfe, "foobar", MARGIN = 2) <- spots

dimGeometries gets or sets all geometry of the given margin.

# Getter, all geometries of one margin
(cgs <- dimGeometries(sfe, MARGIN = 2))
#> List of length 2
#> names(2): spotPoly foobar
# Setter, all geometries
dimGeometries(sfe, MARGIN = 2) <- cgs

dimGeometryNames gets or sets the names of the geometries

(cg_names <- dimGeometryNames(sfe, MARGIN = 2))
#> [1] "spotPoly" "foobar"
# Setter
dimGeometryNames(sfe, MARGIN = 2) <- cg_names

colGeometry(sfe, "spotPoly"), colGeometries(sfe), and colGeometryNames(sfe) are shorthands for dimGeometry(sfe, "spotPoly", MARGIN = 2), dimGeometries(sfe, MARGIN = 2), and dimGeometryNames(sfe, MARGIN = 2) respectively. Similarly, rowGeometr*(sfe, ...) is a shorthand of dimGeometr*(sfe, ..., MARGIN = 1).

There are shorthands for some specific column or row geometries. For example, spotPoly(sfe) is equivalent to colGeometry(sfe, "spotPoly") for Visium spot polygons, and txSpots(sfe) is equivalent to rowGeometry(sfe, "txSpots") for transcript spots in single molecule technologies.

# Getter
(spots <- spotPoly(sfe))
#> Simple feature collection with 77 features and 1 field
#> Geometry type: POLYGON
#> Dimension:     XY
#> Bounding box:  xmin: 5000 ymin: 13000 xmax: 7000 ymax: 15000
#> CRS:           NA
#> First 10 features:
#>                                        geometry sample_id
#> AAATTACCTATCGATG POLYGON ((6472.186 13875.23...     Vis5A
#> AACATATCAACTGGTG POLYGON ((5778.291 13635.43...     Vis5A
#> AAGATTGGCGGAACGT POLYGON ((7000 13809.84, 69...     Vis5A
#> AAGGGACAGATTCTGT POLYGON ((6749.535 13874.64...     Vis5A
#> AATATCGAGGGTTCTC POLYGON ((5500.941 13636.03...     Vis5A
#> AATGATGATACGCTAT POLYGON ((6612.42 14598.82,...     Vis5A
#> AATGATGCGACTCCTG POLYGON ((5501.981 14118.62...     Vis5A
#> AATTCATAAGGGATCT POLYGON ((6889.769 14598.22...     Vis5A
#> ACGAGTACGGATGCCC POLYGON ((5084.397 13395.63...     Vis5A
#> ACGCTAGTGATACACT POLYGON ((5639.096 13394.44...     Vis5A
# Setter
spotPoly(sfe) <- spots

2.2.2 Annotation

Annotation geometries can be get or set with annotGeometries or annotGeometry. In column or row geometries, the number of rows of the sf data frame (i.e. the number of geometries in the data frame) is constrained by the number of rows or columns of the gene count matrix respectively, because just like rowData and colData, each row of a rowGeometry or colGeometry sf data frame must correspond to a row or column of the gene count matrix respectively. In contrast, an annotGeometry sf data frame can have any dimension, not constrained by the dimension of the gene count matrix. Similar to column and row geometries, annotation geometries have annotGeometry, annotGeometries, and annotGeometryNames getters and setters.

# Getter, by name or index
(tb <- annotGeometry(sfe, "tissueBoundary"))
#> Simple feature collection with 1 feature and 2 fields
#> Geometry type: POLYGON
#> Dimension:     XY
#> Bounding box:  xmin: 5094 ymin: 13000 xmax: 7000 ymax: 14969
#> CRS:           NA
#>   ID                       geometry sample_id
#> 7  7 POLYGON ((5094 13000, 5095 ...     Vis5A
plot(st_geometry(tb))

# Setter, by name or index
annotGeometry(sfe, "tissueBoundary") <- tb
# Get all annoation geometries as named list
ags <- annotGeometries(sfe)
# Set all annotation geometries with a named list
annotGeometries(sfe) <- ags
# Get names of annotation geometries
(ag_names <- annotGeometryNames(sfe))
#> [1] "tissueBoundary"      "myofiber_full"       "myofiber_simplified"
#> [4] "nuclei"              "nuclei_centroid"
# Set names
annotGeometryNames(sfe) <- ag_names

There are shorthands for specific annotation geometries. For example, tissueBoundary(sfe) is equivalent to annotGeometry(sfe, "tissueBoundary"). cellSeg() (cell segmentation) and nucSeg() (nuclei segmentation) would first query colGeometries (for single cell, single molecule technologies, equivalent to colGeometry(sfe, "cellSeg") or colGeometry(sfe, "nucSeg")), and if not found, they will query annotGeometries (for array capture and microdissection technologies, equivalent to annotGeometry(sfe, "cellSeg") or annotGeometry(sfe, "nucSeg")).

# Getter
(tb <- tissueBoundary(sfe))
#> Simple feature collection with 1 feature and 2 fields
#> Geometry type: POLYGON
#> Dimension:     XY
#> Bounding box:  xmin: 5094 ymin: 13000 xmax: 7000 ymax: 14969
#> CRS:           NA
#>   ID                       geometry sample_id
#> 7  7 POLYGON ((5094 13000, 5095 ...     Vis5A
# Setter
tissueBoundary(sfe) <- tb

2.3 Spatial graphs

Column, row, and annotation spatial graphs can be get or set with spatialGraphs and spatialGraph functions. Similar to dimGeometr* functions, spatialGraph* functions have a MARGIN argument. However, since internally, row and column geometries are implemented very differently from annotation geometries, while row, column, and annotation graphs are implemented the same way, for the spatialGraph* functions, MARGIN = 1 means rows, MARGIN = 2 means columns, and MARGIN = 3 means annotation. Similar to dimGeometry* functions, there are rowGraph*, colGraph*, and annotGraph* getter and setter functions for each margin.

This package wraps functions in the spdep package to find spatial neighborhood graphs. In this example, triangulation is used to find the spatial graph; many other methods are also supported, such as k nearest neighbors, distance based neighbors, and polygon contiguity.

(g <- findSpatialNeighbors(sfe, MARGIN = 2, method = "tri2nb"))
#> Characteristics of weights list object:
#> Neighbour list object:
#> Number of regions: 77 
#> Number of nonzero links: 428 
#> Percentage nonzero weights: 7.218755 
#> Average number of links: 5.558442 
#> 
#> Weights style: W 
#> Weights constants summary:
#>    n   nn S0      S1       S2
#> W 77 5929 77 28.0096 309.4083
plot(g, coords = spatialCoords(sfe))

# Set graph by name
spatialGraph(sfe, "graph1", MARGIN = 2) <- g
# Or equivalently
colGraph(sfe, "graph1") <- g
# Get graph by name
g <- spatialGraph(sfe, "graph1", MARGIN = 2L)
# Or equivalently
g <- colGraph(sfe, "graph1")
g
#> Characteristics of weights list object:
#> Neighbour list object:
#> Number of regions: 77 
#> Number of nonzero links: 428 
#> Percentage nonzero weights: 7.218755 
#> Average number of links: 5.558442 
#> 
#> Weights style: W 
#> Weights constants summary:
#>    n   nn S0      S1       S2
#> W 77 5929 77 28.0096 309.4083

For Visium, spatial neighborhood graph of the hexagonal grid can be found with the known locations of the barcodes.

colGraph(sfe, "visium") <- findVisiumGraph(sfe)
plot(colGraph(sfe, "visium"), coords = spatialCoords(sfe))

All graphs of the SFE object, or if specified, of the margin of interest, can be get or set with spatialGraphs and the margin specific wrappers.

colGraphs(sfe)
#> $col
#> Characteristics of weights list object:
#> Neighbour list object:
#> Number of regions: 77 
#> Number of nonzero links: 428 
#> Percentage nonzero weights: 7.218755 
#> Average number of links: 5.558442 
#> 
#> Weights style: W 
#> Weights constants summary:
#>    n   nn S0      S1       S2
#> W 77 5929 77 28.0096 309.4083

Similar to dimGeometries, the graphs have spatialGraphNames getter and setter and the margin specific wrappers.

colGraphNames(sfe)
#> [1] "graph1" "visium"

2.4 Multiple samples

Thus far, the example dataset used only has one sample. The SpatialExperiment (SPE) object has a special column in colData called sample_id, so data from multiple tissue sections can coexist in the same SPE object for joint dimension reduction and clustering while keeping the spatial coordinates separate. It’s important to keep spatial coordinates of different tissue sections separate because first, the coordinates would only make sense within the same section, and second, the coordinates from different sections can have overlapping numeric values.

SFE inherits from SPE, and with geometries and spatial graphs, sample_id is even more important. The geometry and graph getter and setter functions have a sample_id argument, which is optional when only one sample is present in the SFE object. This argument is mandatory if multiple samples are present, and can be a character vector for multiple samples or “all” for all samples. Below are examples of using the getters and setters for multiple samples.

# Construct toy dataset with 2 samples
sfe1 <- McKellarMuscleData(dataset = "small")
#> snapshotDate(): 2022-10-31
#> see ?SFEData and browseVignettes('SFEData') for documentation
#> loading from cache
sfe2 <- McKellarMuscleData(dataset = "small2")
#> snapshotDate(): 2022-10-31
#> see ?SFEData and browseVignettes('SFEData') for documentation
#> loading from cache
spotPoly(sfe2)$sample_id <- "sample02"
(sfe_combined <- cbind(sfe1, sfe2))
#> class: SpatialFeatureExperiment 
#> dim: 15123 149 
#> metadata(0):
#> assays(1): counts
#> rownames(15123): ENSMUSG00000025902 ENSMUSG00000096126 ...
#>   ENSMUSG00000064368 ENSMUSG00000064370
#> rowData names(6): Ensembl symbol ... vars cv2
#> colnames(149): AAATTACCTATCGATG AACATATCAACTGGTG ... TTCCTCGGACTAACCA
#>   TTCTGACCGGGCTCAA
#> colData names(12): barcode col ... prop_mito in_tissue
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> spatialCoords names(2) : imageX imageY
#> imgData names(1): sample_id
#> 
#> Geometries:
#> colGeometries: spotPoly (POLYGON) 
#> annotGeometries: tissueBoundary (GEOMETRY), myofiber_full (GEOMETRY), myofiber_simplified (GEOMETRY), nuclei (POLYGON), nuclei_centroid (POINT) 
#> 
#> Graphs:
#> Vis5A: 
#> sample02:

Use the sampleIDs function to see the names of all samples

sampleIDs(sfe_combined)
#> [1] "Vis5A"    "sample02"
# Only get the geometries for the second sample
(spots2 <- colGeometry(sfe_combined, "spotPoly", sample_id = "sample02"))
#> Simple feature collection with 72 features and 1 field
#> Geometry type: POLYGON
#> Dimension:     XY
#> Bounding box:  xmin: 6000 ymin: 7025.865 xmax: 8000 ymax: 9000
#> CRS:           NA
#> First 10 features:
#>                  sample_id                       geometry
#> AACACACGCTCGCCGC  sample02 POLYGON ((6597.869 7842.575...
#> AACCGCTAAGGGATGC  sample02 POLYGON ((6724.811 9000, 67...
#> AACGCTGTTGCTGAAA  sample02 POLYGON ((6457.635 7118.991...
#> AACGGACGTACGTATA  sample02 POLYGON ((6737.064 8083.571...
#> AATAGAATCTGTTTCA  sample02 POLYGON ((7570.153 8564.368...
#> ACAAATCGCACCGAAT  sample02 POLYGON ((8000 7997.001, 79...
#> ACAATTGTGTCTCTTT  sample02 POLYGON ((6043.169 7843.77,...
#> ACAGGCTTGCCCGACT  sample02 POLYGON ((7428.88 7358.195,...
#> ACCAGTGCGGGAGACG  sample02 POLYGON ((6460.753 8566.757...
#> ACCCTCCCTTGCTATT  sample02 POLYGON ((7847.503 8563.771...
# Only set the geometries for the second sample
# Leaving geometries of the first sample intact
colGeometry(sfe_combined, "spotPoly", sample_id = "sample02") <- spots2
# Set graph only for the second sample
colGraph(sfe_combined, "foo", sample_id = "sample02") <- 
  findSpatialNeighbors(sfe_combined, sample_id = "sample02")
# Get graph only for the second sample
colGraph(sfe_combined, "foo", sample_id = "sample02")
#> Characteristics of weights list object:
#> Neighbour list object:
#> Number of regions: 72 
#> Number of nonzero links: 406 
#> Percentage nonzero weights: 7.83179 
#> Average number of links: 5.638889 
#> 
#> Weights style: W 
#> Weights constants summary:
#>    n   nn S0       S1       S2
#> W 72 5184 72 25.82104 289.8299
# Set graph of the same name for both samples
# The graphs are computed separately for each sample
colGraphs(sfe_combined, sample_id = "all", name = "visium") <- 
  findVisiumGraph(sfe_combined, sample_id = "all")
# Get multiple graphs of the same name
colGraphs(sfe_combined, sample_id = "all", name = "visium")
#> $Vis5A
#> Characteristics of weights list object:
#> Neighbour list object:
#> Number of regions: 77 
#> Number of nonzero links: 394 
#> Percentage nonzero weights: 6.645303 
#> Average number of links: 5.116883 
#> 
#> Weights style: W 
#> Weights constants summary:
#>    n   nn S0       S1       S2
#> W 77 5929 77 31.68056 311.7544
#> 
#> $sample02
#> Characteristics of weights list object:
#> Neighbour list object:
#> Number of regions: 72 
#> Number of nonzero links: 366 
#> Percentage nonzero weights: 7.060185 
#> Average number of links: 5.083333 
#> 
#> Weights style: W 
#> Weights constants summary:
#>    n   nn S0       S1       S2
#> W 72 5184 72 29.83889 291.5833
# Or just all graphs of the margin
colGraphs(sfe_combined, sample_id = "all")
#> $Vis5A
#> $Vis5A$visium
#> Characteristics of weights list object:
#> Neighbour list object:
#> Number of regions: 77 
#> Number of nonzero links: 394 
#> Percentage nonzero weights: 6.645303 
#> Average number of links: 5.116883 
#> 
#> Weights style: W 
#> Weights constants summary:
#>    n   nn S0       S1       S2
#> W 77 5929 77 31.68056 311.7544
#> 
#> 
#> $sample02
#> $sample02$foo
#> Characteristics of weights list object:
#> Neighbour list object:
#> Number of regions: 72 
#> Number of nonzero links: 406 
#> Percentage nonzero weights: 7.83179 
#> Average number of links: 5.638889 
#> 
#> Weights style: W 
#> Weights constants summary:
#>    n   nn S0       S1       S2
#> W 72 5184 72 25.82104 289.8299
#> 
#> $sample02$visium
#> Characteristics of weights list object:
#> Neighbour list object:
#> Number of regions: 72 
#> Number of nonzero links: 366 
#> Percentage nonzero weights: 7.060185 
#> Average number of links: 5.083333 
#> 
#> Weights style: W 
#> Weights constants summary:
#>    n   nn S0       S1       S2
#> W 72 5184 72 29.83889 291.5833

Sample IDs can also be changed, with the changeSampleIDs function, with a named vector whose names are the old names and values are the new names.

sfe_combined <- changeSampleIDs(sfe, replacement = c(Vis5A = "foo", sample02 = "bar"))
sfe_combined
#> class: SpatialFeatureExperiment 
#> dim: 15123 77 
#> metadata(0):
#> assays(1): counts
#> rownames(15123): ENSMUSG00000025902 ENSMUSG00000096126 ...
#>   ENSMUSG00000064368 ENSMUSG00000064370
#> rowData names(6): Ensembl symbol ... vars cv2
#> colnames(77): AAATTACCTATCGATG AACATATCAACTGGTG ... TTCTTTGGTCGCGACG
#>   TTGATGTGTAGTCCCG
#> colData names(12): barcode col ... prop_mito in_tissue
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> spatialCoords names(2) : imageX imageY
#> imgData names(1): sample_id
#> 
#> Geometries:
#> colGeometries: spotPoly (POLYGON), foobar (POLYGON) 
#> annotGeometries: tissueBoundary (POLYGON), myofiber_full (GEOMETRY), myofiber_simplified (GEOMETRY), nuclei (POLYGON), nuclei_centroid (POINT) 
#> 
#> Graphs:
#> foo: col: graph1, visium

3 Object construction

3.1 From scratch

An SFE object can be constructed from scratch with the assay matrices and metadata. In this toy example, dgCMatrix is used, but since SFE inherits from SingleCellExperiment (SCE), other types of arrays supported by SCE such as delayed arrays should also work.

# Visium barcode location from Space Ranger
data("visium_row_col")
coords1 <- visium_row_col[visium_row_col$col < 6 & visium_row_col$row < 6,]
coords1$row <- coords1$row * sqrt(3)

# Random toy sparse matrix
set.seed(29)
col_inds <- sample(1:13, 13)
row_inds <- sample(1:5, 13, replace = TRUE)
values <- sample(1:5, 13, replace = TRUE)
mat <- sparseMatrix(i = row_inds, j = col_inds, x = values)
colnames(mat) <- coords1$barcode
rownames(mat) <- sample(LETTERS, 5)

That should be sufficient to create an SPE object, and an SFE object, even though no sf data frame was constructed for the geometries. The constructor behaves similarly to the SPE constructor. The centroid coordinates of the Visium spots in the toy example can be converted into spot polygons with the spotDiameter argument. Spot diameter in pixels in full resolution image can be found in the scalefactors_json.json file in Space Ranger output.

sfe3 <- SpatialFeatureExperiment(list(counts = mat), colData = coords1,
                                spatialCoordsNames = c("col", "row"),
                                spotDiameter = 0.7)

More geometries and spatial graphs can be added after calling the constructor.

Geometries can also be supplied in the constructor.

# Convert regular data frame with coordinates to sf data frame
cg <- df2sf(coords1[,c("col", "row")], c("col", "row"), spotDiameter = 0.7)
rownames(cg) <- colnames(mat)
sfe3 <- SpatialFeatureExperiment(list(counts = mat), colGeometries = list(foo = cg))

3.2 Space Ranger output

Space Ranger output can be read in a similar manner as in SpatialExperiment; the returned SFE object has the spotPoly column geometry for the spot polygons. If the filtered matrix is read in, then a column graph called visium will also be present, for the spatial neighborhood graph of the Visium spots on tissue. The graph is not computed if all spots are read in regardless of whether they are on tissue.

# Example from SpatialExperiment
dir <- system.file(
  file.path("extdata", "10xVisium"), 
  package = "SpatialExperiment")
  
sample_ids <- c("section1", "section2")
samples <- file.path(dir, sample_ids, "outs")
  
list.files(samples[1])
#> [1] "raw_feature_bc_matrix" "spatial"
list.files(file.path(samples[1], "spatial"))
#> [1] "scalefactors_json.json"    "tissue_lowres_image.png"  
#> [3] "tissue_positions_list.csv"
file.path(samples[1], "raw_feature_bc_matrix")
#> [1] "/home/biocbuild/bbs-3.16-bioc/R/library/SpatialExperiment/extdata/10xVisium/section1/outs/raw_feature_bc_matrix"
(sfe3 <- read10xVisiumSFE(samples, sample_ids, type = "sparse", data = "raw",
                         load = FALSE))
#> Warning: as(<dgTMatrix>, "dgCMatrix") is deprecated since Matrix 1.5-0; do as(.,
#> "CsparseMatrix") instead

#> Warning: as(<dgTMatrix>, "dgCMatrix") is deprecated since Matrix 1.5-0; do as(.,
#> "CsparseMatrix") instead
#> class: SpatialFeatureExperiment 
#> dim: 50 99 
#> metadata(0):
#> assays(1): counts
#> rownames(50): ENSMUSG00000051951 ENSMUSG00000089699 ...
#>   ENSMUSG00000005886 ENSMUSG00000101476
#> rowData names(1): symbol
#> colnames(99): AAACAACGAATAGTTC-1 AAACAAGTATCTCCCA-1 ...
#>   AAAGTCGACCCTCAGT-1-1 AAAGTGCCATCAATTA-1-1
#> colData names(4): in_tissue array_row array_col sample_id
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> spatialCoords names(2) : pxl_row_in_fullres pxl_col_in_fullres
#> imgData names(4): sample_id image_id data scaleFactor
#> 
#> Geometries:
#> colGeometries: spotPoly (POLYGON) 
#> 
#> Graphs:
#> section1: 
#> section2:

3.3 Coercion from SpatialExperiment

SPE objects can be coerced into SFE objects. If column geometries or spot diameter are not specified, then a column geometry called “centroids” will be created.

spe <- read10xVisium(samples, sample_ids, type = "sparse", data = "raw", 
  images = "lowres", load = FALSE)
#> Warning: as(<dgTMatrix>, "dgCMatrix") is deprecated since Matrix 1.5-0; do as(.,
#> "CsparseMatrix") instead

#> Warning: as(<dgTMatrix>, "dgCMatrix") is deprecated since Matrix 1.5-0; do as(.,
#> "CsparseMatrix") instead

For the coercion, column names must not be duplicate.

colnames(spe) <- make.unique(colnames(spe), sep = "-")
rownames(spatialCoords(spe)) <- colnames(spe)
(sfe3 <- toSpatialFeatureExperiment(spe))
#> class: SpatialFeatureExperiment 
#> dim: 50 99 
#> metadata(0):
#> assays(1): counts
#> rownames(50): ENSMUSG00000051951 ENSMUSG00000089699 ...
#>   ENSMUSG00000005886 ENSMUSG00000101476
#> rowData names(1): symbol
#> colnames(99): AAACAACGAATAGTTC-1 AAACAAGTATCTCCCA-1 ...
#>   AAAGTCGACCCTCAGT-1-1 AAAGTGCCATCAATTA-1-1
#> colData names(4): in_tissue array_row array_col sample_id
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
#> imgData names(4): sample_id image_id data scaleFactor
#> 
#> Geometries:
#> colGeometries: centroids (POINT) 
#> 
#> Graphs:
#> section1: 
#> section2:

4 Operations

4.1 Non-geometric

SFE objects can be concatenated with cbind, as was done just now to create a toy example with 2 samples.

sfe_combined <- cbind(sfe1, sfe2)

The SFE object can also be subsetted like a matrix, like an SCE object. More complexity arises when it comes to the spatial graphs. The drop argument of the SFE method [ determines what to do with the spatial graphs. If drop = TRUE, then all spatial graphs will be removed, since the graphs with nodes and edges that have been removed are no longer valid. If drop = FALSE, which is the default, then the spatial graphs will be reconstructed with the remaining nodes after subsetting. Reconstruction would only work when the original graphs were constructed with findSpatialNeighbors or findVisiumGraph in this package, which records the method and parameters used to construct the graphs. If reconstruction fails, then a waning will be issued and the graphs removed.

(sfe_subset <- sfe[1:10, 1:10, drop = TRUE])
#> Node indices in the graphs are no longer valid after subsetting. Dropping all row and col graphs.
#> class: SpatialFeatureExperiment 
#> dim: 10 10 
#> metadata(0):
#> assays(1): counts
#> rownames(10): ENSMUSG00000025902 ENSMUSG00000096126 ...
#>   ENSMUSG00000090031 ENSMUSG00000033740
#> rowData names(6): Ensembl symbol ... vars cv2
#> colnames(10): AAATTACCTATCGATG AACATATCAACTGGTG ... ACGAGTACGGATGCCC
#>   ACGCTAGTGATACACT
#> colData names(12): barcode col ... prop_mito in_tissue
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> spatialCoords names(2) : imageX imageY
#> imgData names(1): sample_id
#> 
#> Geometries:
#> colGeometries: spotPoly (POLYGON), foobar (POLYGON) 
#> annotGeometries: tissueBoundary (POLYGON), myofiber_full (GEOMETRY), myofiber_simplified (GEOMETRY), nuclei (POLYGON), nuclei_centroid (POINT) 
#> 
#> Graphs:
#> Vis5A:
# Will give warning because graph reconstruction fails
sfe_subset <- sfe[1:10, 1:10]

4.2 Geometric

Just like sf data frames, SFE objects can be subsetted by a geometry and a predicate relating geometries. For example, if all Visium spots were read into an SFE object regardless of whether they are in tissue, and the tissueBoundary annotation geometry is provided, then the tissue boundary geometry can be used to subset the SFE object to obtain a new SFE object with only spots on tissue. Loupe does not give the tissue boundary polygon; such polygon can be obtained by thresholding the H&E image and converting the mask into polygons with OpenCV or the terra R package, or by manual annotation in QuPath or LabKit (the latter needs to be converted into polygon).

Use the crop function to directly get the subsetted SFE object. Note that in this version of this package, crop does NOT crop the image.

# Before
plot(st_geometry(tissueBoundary(sfe)))
plot(spotPoly(sfe), col = "gray", add = TRUE)

sfe_in_tissue <- crop(sfe, y = tissueBoundary(sfe), colGeometryName = "spotPoly")

Note that for large datasets with many geometries, cropping can take a while to run.

# After
plot(st_geometry(tissueBoundary(sfe)))
plot(spotPoly(sfe_in_tissue), col = "gray", add = TRUE)

crop can also be used in the conventional sense of cropping, i.e. specifying a bounding box.

sfe_cropped <- crop(sfe, colGeometryName = "spotPoly", sample_id = "Vis5A",
                    xmin = 5500, xmax = 6500, ymin = 13500, ymax = 14500)

The colGeometryName is used to determine which columns in the gene count matrix to keep. All geometries in the SFE object will be subsetted so only portions intersecting y or the bounding box are kept. Since the intersection operation can produce a mixture of geometry types, such as intersection of two polygons producing polygons, points, and lines, the geometry types of the sf data frames after subsetting may be different from those of the originals.

The cropping is done independently for each sample_id, and only on sample_ids specified. Again, sample_id is optional when there is only one sample in the SFE object.

Geometry predicates and operations can also be performed to return the results without subsetting an SFE object. For example, one may want a logical vector indicating whether each Visium spot intersects the tissue, or a numeric vector of how many nuclei there are in each Visium spot. Or get the intersections between each Visium spot and nuclei. Again, the geometry predicates and operations are performed independently for each sample, and the sample_id argument is optional when there is only one sample.

# Get logical vector
colData(sfe)$in_tissue <- annotPred(sfe, colGeometryName = "spotPoly", 
                                    annotGeometryName = "tissueBoundary",
                                    sample_id = "Vis5A")
# Get the number of nuclei per Visium spot
colData(sfe)$n_nuclei <- annotNPred(sfe, "spotPoly", annotGeometryName = "nuclei")
# Get geometries of intersections of Visium spots and myofibers
spot_intersections <- annotOp(sfe, colGeometryName = "spotPoly", 
                              annotGeometryName = "myofiber_simplified")

Sometimes the spatial coordinates of different samples can take very different values. The values can be made more comparable by moving all tissues so the bottom left corner of the bounding box would be at the origin, which would facilitate plotting and comparison across samples with geom_sf and facet_*.

To find the bounding box of all geometries in each sample of an SFE object:

bbox(sfe, sample_id = "Vis5A")
#>  xmin  ymin  xmax  ymax 
#>  5000 13000  7000 15000

To move the coordinates:

sfe_moved <- removeEmptySpace(sfe, sample_id = "Vis5A")

The original bounding box before moving is stored within the SFE object, which can be read by dimGeometry setters so newly added geometries can have coordinates moved as well; this behavior can be turned off with the optional argument translate = FALSE in dimGeometry setters.

5 Limitations and future directions

These are the limitations of the current version of SFE:

  1. By integrating with sf, which is designed for vector spatial data (specifying coordinates of points, lines, and polygons vertices), SFE only supports vector data for the geometries, and raster (like an image, with a value at each pixel) is not supported. Vector is chosen, as it is a more memory efficient way to represent cell and nuclei segmentation than a raster map.
  2. The spatial graphs are listw objects so no conversion is necessary to use the well-established spatial statistical methods in the spdep, spatialreg, and adespatial packages. However, igraph implements many graph analysis methods, and conversion is required to use them. Whether future versions of SFE will stick to listw depends on importance of methods that use spatial graphs in igraph class.
  3. While Simple Features support 3D and spatiotemporal coordinates, most geospatial resources SFE leverages sf for is for 2D data.
  4. Spatial point process analysis with the spatstat package may be relevant, such as in analyzing spatial distribution of nuclei or transcript spots. As spatstat predates sf by over a decade, spatstat does not play very nicely with sf. However, since analyses of nuclei and transcript spot localization don’t center on the gene count matrix, whether spatstat analyses should be integrated into SFE (which is centered on the gene count matrix) is questionable.
  5. Geometries for very large datasets can get very large. On disk operations of the geometries should be considered. The geospatial field already has on disk tools for both vector and raster data. So far, SFE has only been tested on data that fit into memory.
  6. Setting units of length in the SFE object and converting units. This can make geometries of different samples and datasets comparable, and helpful to plotting scale bars when plotting geometries.

6 Session info

sessionInfo()
#> R version 4.2.2 (2022-10-31)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04.5 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.16-bioc/R/lib/libRblas.so
#> LAPACK: /home/biocbuild/bbs-3.16-bioc/R/lib/libRlapack.so
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] Matrix_1.5-3                   sf_1.0-9                      
#>  [3] SFEData_1.0.2                  SpatialExperiment_1.8.0       
#>  [5] SingleCellExperiment_1.20.0    SummarizedExperiment_1.28.0   
#>  [7] Biobase_2.58.0                 GenomicRanges_1.50.2          
#>  [9] GenomeInfoDb_1.34.6            IRanges_2.32.0                
#> [11] S4Vectors_0.36.1               BiocGenerics_0.44.0           
#> [13] MatrixGenerics_1.10.0          matrixStats_0.63.0            
#> [15] SpatialFeatureExperiment_1.0.3 BiocStyle_2.26.0              
#> 
#> loaded via a namespace (and not attached):
#>   [1] rjson_0.2.21                  deldir_1.0-6                 
#>   [3] ellipsis_0.3.2                class_7.3-20                 
#>   [5] scuttle_1.8.3                 XVector_0.38.0               
#>   [7] proxy_0.4-27                  bit64_4.0.5                  
#>   [9] AnnotationDbi_1.60.0          interactiveDisplayBase_1.36.0
#>  [11] fansi_1.0.3                   codetools_0.2-18             
#>  [13] R.methodsS3_1.8.2             sparseMatrixStats_1.10.0     
#>  [15] cachem_1.0.6                  knitr_1.41                   
#>  [17] jsonlite_1.8.4                dbplyr_2.2.1                 
#>  [19] png_0.1-8                     R.oo_1.25.0                  
#>  [21] shiny_1.7.4                   HDF5Array_1.26.0             
#>  [23] BiocManager_1.30.19           compiler_4.2.2               
#>  [25] httr_1.4.4                    dqrng_0.3.0                  
#>  [27] assertthat_0.2.1              fastmap_1.1.0                
#>  [29] limma_3.54.0                  cli_3.6.0                    
#>  [31] later_1.3.0                   s2_1.1.2                     
#>  [33] htmltools_0.5.4               tools_4.2.2                  
#>  [35] glue_1.6.2                    GenomeInfoDbData_1.2.9       
#>  [37] dplyr_1.0.10                  wk_0.7.1                     
#>  [39] rappdirs_0.3.3                Rcpp_1.0.9                   
#>  [41] jquerylib_0.1.4               Biostrings_2.66.0            
#>  [43] vctrs_0.5.1                   rhdf5filters_1.10.0          
#>  [45] spdep_1.2-7                   ExperimentHub_2.6.0          
#>  [47] DelayedMatrixStats_1.20.0     xfun_0.36                    
#>  [49] stringr_1.5.0                 beachmat_2.14.0              
#>  [51] mime_0.12                     lifecycle_1.0.3              
#>  [53] AnnotationHub_3.6.0           edgeR_3.40.1                 
#>  [55] zlibbioc_1.44.0               promises_1.2.0.1             
#>  [57] parallel_4.2.2                rhdf5_2.42.0                 
#>  [59] yaml_2.3.6                    curl_5.0.0                   
#>  [61] memoise_2.0.1                 sass_0.4.4                   
#>  [63] stringi_1.7.12                RSQLite_2.2.20               
#>  [65] BiocVersion_3.16.0            highr_0.10                   
#>  [67] e1071_1.7-12                  filelock_1.0.2               
#>  [69] boot_1.3-28.1                 BiocParallel_1.32.5          
#>  [71] spData_2.2.1                  rlang_1.0.6                  
#>  [73] pkgconfig_2.0.3               bitops_1.0-7                 
#>  [75] evaluate_0.19                 lattice_0.20-45              
#>  [77] purrr_1.0.1                   Rhdf5lib_1.20.0              
#>  [79] bit_4.0.5                     tidyselect_1.2.0             
#>  [81] magrittr_2.0.3                bookdown_0.31                
#>  [83] R6_2.5.1                      magick_2.7.3                 
#>  [85] generics_0.1.3                DelayedArray_0.24.0          
#>  [87] DBI_1.1.3                     withr_2.5.0                  
#>  [89] pillar_1.8.1                  units_0.8-1                  
#>  [91] KEGGREST_1.38.0               RCurl_1.98-1.9               
#>  [93] sp_1.5-1                      tibble_3.1.8                 
#>  [95] crayon_1.5.2                  DropletUtils_1.18.1          
#>  [97] KernSmooth_2.23-20            utf8_1.2.2                   
#>  [99] BiocFileCache_2.6.0           rmarkdown_2.19               
#> [101] locfit_1.5-9.7                grid_4.2.2                   
#> [103] blob_1.2.3                    digest_0.6.31                
#> [105] classInt_0.4-8                xtable_1.8-4                 
#> [107] dbscan_1.1-11                 httpuv_1.6.8                 
#> [109] R.utils_2.12.2                bslib_0.4.2