Skip to contents

This function downsamples the number of cells in a `tof_tbl` using the one of three methods (randomly sampling a constant number of cells, randomly sampling a proportion of cells, or performing density-dependent downsampling per the algorithm in Qiu et al., (2011)).

Usage

tof_downsample(
  tof_tibble,
  group_cols = NULL,
  ...,
  method = c("constant", "prop", "density")
)

Arguments

tof_tibble

A `tof_tbl` or a `tibble`.

group_cols

Unquoted names of the columns in `tof_tibble` that should be used to define groups within which the downsampling will be performed. Supports tidyselect helpers. Defaults to `NULL` (no grouping).

...

Additional arguments to pass to the `tof_downsample_*` function family member corresponding to the chosen method.

method

A string indicating which downsampling method to use: "constant" (the default), "prop", or "density".

Value

A downsampled `tof_tbl` with the same number of columns as the input `tof_tibble`, but fewer rows. The number of rows in the result will depend on the chosen downsampling method.

See also

Examples

sim_data <-
    dplyr::tibble(
        cd45 = rnorm(n = 1000),
        cd38 = rnorm(n = 1000),
        cd34 = rnorm(n = 1000),
        cd19 = rnorm(n = 1000),
        cluster_id = sample(letters, size = 1000, replace = TRUE)
    )

# sample 200 cells from the input data
tof_downsample(
    tof_tibble = sim_data,
    num_cells = 200L,
    method = "constant"
)
#> # A tibble: 200 × 5
#>      cd45   cd38    cd34   cd19 cluster_id
#>     <dbl>  <dbl>   <dbl>  <dbl> <chr>     
#>  1 -1.48  -1.24  -0.907  -2.16  q         
#>  2  1.58  -0.388  1.21   -1.31  q         
#>  3  0.858 -0.394  0.204  -0.432 l         
#>  4  1.04  -1.07   0.359   0.169 g         
#>  5 -0.592 -0.976  2.04   -0.581 a         
#>  6  1.33  -0.271  0.0306 -0.183 o         
#>  7 -0.462  0.953  0.650   1.71  h         
#>  8  1.49  -0.326  2.00    0.851 s         
#>  9  1.15  -0.681 -1.82   -0.720 t         
#> 10 -0.618 -1.49  -0.954  -1.90  c         
#> # ℹ 190 more rows

# sample 10% of all cells from the input data
tof_downsample(
    tof_tibble = sim_data,
    prop_cells = 0.1,
    method = "prop"
)
#> # A tibble: 100 × 5
#>       cd45   cd38   cd34   cd19 cluster_id
#>      <dbl>  <dbl>  <dbl>  <dbl> <chr>     
#>  1 -2.57   -1.17   1.37  -0.400 v         
#>  2  0.564  -1.23  -0.198 -0.832 k         
#>  3 -0.410   0.913  0.979  0.144 o         
#>  4 -0.586   0.963  1.37   1.01  k         
#>  5 -0.727  -1.26   0.907 -1.75  m         
#>  6  1.31   -0.416 -0.575  0.102 j         
#>  7 -0.572  -0.672 -0.336 -0.910 r         
#>  8 -0.0373  0.361 -0.425 -1.15  k         
#>  9 -1.43   -0.923  0.913 -1.21  h         
#> 10 -0.537   1.09   0.460 -0.937 x         
#> # ℹ 90 more rows

# sample ~10% of cells from the input data using density dependence
tof_downsample(
    tof_tibble = sim_data,
    target_prop_cells = 0.1,
    method = "density"
)
#> # A tibble: 101 × 5
#>       cd45    cd38   cd34   cd19 cluster_id
#>      <dbl>   <dbl>  <dbl>  <dbl> <chr>     
#>  1 -2.68   -0.976  -1.02  -0.607 j         
#>  2 -0.238   0.567   0.270 -1.33  h         
#>  3  1.44   -0.963  -0.707 -2.41  d         
#>  4 -0.0581  0.462  -0.916 -1.18  h         
#>  5  0.307   0.591  -1.48  -0.549 l         
#>  6 -0.523  -1.33   -1.19  -0.456 t         
#>  7  0.693   0.901   0.424  1.18  m         
#>  8 -0.106   1.35   -0.159  0.315 n         
#>  9  0.204   1.09    0.189 -0.810 q         
#> 10  0.0685 -0.0127 -1.31   0.187 w         
#> # ℹ 91 more rows