Estimate cells' local densities as done in Spanning-tree Progression Analysis of Density-normalized Events (SPADE)
Source:R/utils.R
tof_spade_density.Rd
This function uses the algorithm described in
Qiu et al., (2011) to estimate
the local density of each cell in a `tof_tbl` or `tibble` containing high-dimensional cytometry data.
Briefly, this algorithm involves counting the number of neighboring cells
within a sphere of radius alpha surrounding each cell. Here, we do so using
the nn2
function.
Arguments
- tof_tibble
A `tof_tbl` or a `tibble`.
- distance_cols
Unquoted names of the columns in `tof_tibble` to use in calculating cell-to-cell distances during the local density estimation for each cell. Defaults to all numeric columns in `tof_tibble`.
- distance_function
A string indicating which distance function to use for calculating cell-to-cell distances during local density estimation. Options include "euclidean" (the default) and "cosine".
- num_alpha_cells
An integer indicating how many cells from `tof_tibble` should be randomly sampled from `tof_tibble` in order to estimate `alpha`, the radius of the sphere constructed around each cell during local density estimation. Alpha is calculated by taking the median nearest-neighbor distance from the `num_alpha_cells` randomly-sampled cells and multiplying it by `alpha_multiplier`. Defaults to 2000.
- alpha_multiplier
An numeric value indicating the multiplier that should be used when calculating `alpha`, the radius of the sphere constructed around each cell during local density estimation. Alpha is calculated by taking the median nearest-neighbor distance from the `num_alpha_cells` cells randomly-sampled from `tof_tibble` and multiplying it by `alpha_multiplier`. Defaults to 5.
- max_neighbors
An integer indicating the maximum number of neighbors that can be counted within the sphere surrounding any given cell. Implemented to reduce the density estimation procedure's speed and memory requirements. Defaults to 1% of the number of rows in `tof_tibble`.
- normalize
A boolean value indicating if the vector of local density estimates should be normalized to values between 0 and 1. Defaults to TRUE.
- ...
Additional optional arguments to pass to
tof_find_knn
.
Value
A tibble with a single column named ".spade_density" containing the local density estimates for each input cell in `tof_tibble`.
See also
Other local density estimation functions:
tof_estimate_density()
,
tof_knn_density()
Examples
sim_data <-
dplyr::tibble(
cd45 = rnorm(n = 1000),
cd38 = rnorm(n = 1000),
cd34 = rnorm(n = 1000),
cd19 = rnorm(n = 1000)
)
# perform the density estimation
tof_spade_density(tof_tibble = sim_data)
#> # A tibble: 1,000 × 1
#> .spade_density
#> <dbl>
#> 1 1
#> 2 1
#> 3 1
#> 4 1
#> 5 1
#> 6 1
#> 7 1
#> 8 1
#> 9 1
#> 10 1
#> # ℹ 990 more rows
# perform the density estimation using cosine distance
tof_spade_density(
tof_tibble = sim_data,
distance_function = "cosine",
alpha_multiplier = 2
)
#> # A tibble: 1,000 × 1
#> .spade_density
#> <dbl>
#> 1 0.286
#> 2 0.286
#> 3 0.143
#> 4 0.286
#> 5 0.143
#> 6 0.286
#> 7 0.857
#> 8 0.429
#> 9 0.571
#> 10 0.429
#> # ℹ 990 more rows
# perform the density estimation with a smaller search radius around
# each cell
tof_spade_density(
tof_tibble = sim_data,
alpha_multiplier = 2
)
#> # A tibble: 1,000 × 1
#> .spade_density
#> <dbl>
#> 1 1
#> 2 1
#> 3 1
#> 4 0.5
#> 5 1
#> 6 1
#> 7 0.2
#> 8 1
#> 9 0.2
#> 10 1
#> # ℹ 990 more rows