Skip to contents

This function uses the algorithm described in Qiu et al., (2011) to estimate the local density of each cell in a `tof_tbl` or `tibble` containing high-dimensional cytometry data. Briefly, this algorithm involves counting the number of neighboring cells within a sphere of radius alpha surrounding each cell. Here, we do so using the nn2 function.

Usage

tof_spade_density(
  tof_tibble,
  distance_cols = where(tof_is_numeric),
  distance_function = c("euclidean", "cosine", "l2", "ip"),
  num_alpha_cells = 2000L,
  alpha_multiplier = 5,
  max_neighbors = round(0.01 * nrow(tof_tibble)),
  normalize = TRUE,
  ...
)

Arguments

tof_tibble

A `tof_tbl` or a `tibble`.

distance_cols

Unquoted names of the columns in `tof_tibble` to use in calculating cell-to-cell distances during the local density estimation for each cell. Defaults to all numeric columns in `tof_tibble`.

distance_function

A string indicating which distance function to use for calculating cell-to-cell distances during local density estimation. Options include "euclidean" (the default) and "cosine".

num_alpha_cells

An integer indicating how many cells from `tof_tibble` should be randomly sampled from `tof_tibble` in order to estimate `alpha`, the radius of the sphere constructed around each cell during local density estimation. Alpha is calculated by taking the median nearest-neighbor distance from the `num_alpha_cells` randomly-sampled cells and multiplying it by `alpha_multiplier`. Defaults to 2000.

alpha_multiplier

An numeric value indicating the multiplier that should be used when calculating `alpha`, the radius of the sphere constructed around each cell during local density estimation. Alpha is calculated by taking the median nearest-neighbor distance from the `num_alpha_cells` cells randomly-sampled from `tof_tibble` and multiplying it by `alpha_multiplier`. Defaults to 5.

max_neighbors

An integer indicating the maximum number of neighbors that can be counted within the sphere surrounding any given cell. Implemented to reduce the density estimation procedure's speed and memory requirements. Defaults to 1% of the number of rows in `tof_tibble`.

normalize

A boolean value indicating if the vector of local density estimates should be normalized to values between 0 and 1. Defaults to TRUE.

...

Additional optional arguments to pass to tof_find_knn.

Value

A tibble with a single column named ".spade_density" containing the local density estimates for each input cell in `tof_tibble`.

See also

Other local density estimation functions: tof_estimate_density(), tof_knn_density()

Examples

sim_data <-
    dplyr::tibble(
        cd45 = rnorm(n = 1000),
        cd38 = rnorm(n = 1000),
        cd34 = rnorm(n = 1000),
        cd19 = rnorm(n = 1000)
    )

# perform the density estimation
tof_spade_density(tof_tibble = sim_data)
#> # A tibble: 1,000 × 1
#>    .spade_density
#>             <dbl>
#>  1              1
#>  2              1
#>  3              1
#>  4              1
#>  5              1
#>  6              1
#>  7              1
#>  8              1
#>  9              1
#> 10              1
#> # ℹ 990 more rows

# perform the density estimation using cosine distance
tof_spade_density(
    tof_tibble = sim_data,
    distance_function = "cosine",
    alpha_multiplier = 2
)
#> # A tibble: 1,000 × 1
#>    .spade_density
#>             <dbl>
#>  1          0.286
#>  2          0.286
#>  3          0.143
#>  4          0.286
#>  5          0.143
#>  6          0.286
#>  7          0.857
#>  8          0.429
#>  9          0.571
#> 10          0.429
#> # ℹ 990 more rows

# perform the density estimation with a smaller search radius around
# each cell
tof_spade_density(
    tof_tibble = sim_data,
    alpha_multiplier = 2
)
#> # A tibble: 1,000 × 1
#>    .spade_density
#>             <dbl>
#>  1            1  
#>  2            1  
#>  3            1  
#>  4            0.5
#>  5            1  
#>  6            1  
#>  7            0.2
#>  8            1  
#>  9            0.2
#> 10            1  
#> # ℹ 990 more rows