Skip to contents

This function downsamples the number of cells in a `tof_tbl` by randomly selecting `num_cells` cells from each unique combination of values in `group_cols`.

Usage

tof_downsample_constant(tof_tibble, group_cols = NULL, num_cells)

Arguments

tof_tibble

A `tof_tbl` or a `tibble`.

group_cols

Unquoted names of the columns in `tof_tibble` that should be used to define groups from which `num_cells` will be downsampled. Supports tidyselect helpers. Defaults to `NULL` (no grouping).

num_cells

An integer number of cells that should be sampled from each group defined by `group_cols`.

Value

A `tof_tbl` with the same number of columns as the input `tof_tibble`, but fewer rows. Specifically, the number of rows will be `num_cells` multiplied by the number of unique combinations of the values in `group_cols`. If any group has fewer than `num_cells` number of cells, all cells from that group will be kept.

See also

Other downsampling functions: tof_downsample(), tof_downsample_density(), tof_downsample_prop()

Examples

sim_data <-
    dplyr::tibble(
        cd45 = rnorm(n = 1000),
        cd38 = rnorm(n = 1000),
        cd34 = rnorm(n = 1000),
        cd19 = rnorm(n = 1000),
        cluster_id = sample(letters, size = 1000, replace = TRUE)
    )

# sample 500 cells from the input data
tof_downsample_constant(
    tof_tibble = sim_data,
    num_cells = 500L
)
#> # A tibble: 500 × 5
#>      cd45    cd38   cd34    cd19 cluster_id
#>     <dbl>   <dbl>  <dbl>   <dbl> <chr>     
#>  1 -0.100 -0.0492  0.282  0.865  l         
#>  2  0.199 -1.22   -0.670  0.685  s         
#>  3 -1.46   0.855   1.02  -0.0620 e         
#>  4  1.18  -0.316  -2.67  -0.113  q         
#>  5  0.837  0.345   0.303 -1.44   u         
#>  6  0.566 -0.0197 -0.764  0.255  g         
#>  7 -0.460  0.287  -1.11   1.09   d         
#>  8  0.105  1.79    2.01  -0.486  e         
#>  9  1.90  -0.971   0.278 -0.0794 h         
#> 10  1.38   0.302  -0.186  0.296  g         
#> # ℹ 490 more rows

# sample 20 cells per cluster from the input data
tof_downsample_constant(
    tof_tibble = sim_data,
    group_cols = cluster_id,
    num_cells = 20L
)
#> # A tibble: 520 × 5
#>       cd45    cd38   cd34    cd19 cluster_id
#>      <dbl>   <dbl>  <dbl>   <dbl> <chr>     
#>  1 -0.0611  0.244   1.33  -0.653  y         
#>  2 -0.328   1.14    0.231 -0.620  p         
#>  3  0.578  -0.318  -0.954  0.135  x         
#>  4 -1.46    0.855   1.02  -0.0620 e         
#>  5  0.566  -0.0197 -0.764  0.255  g         
#>  6 -0.153   0.909  -0.456  0.267  r         
#>  7 -0.0409 -0.556   2.49   0.912  f         
#>  8 -0.817  -1.64   -2.27  -0.981  z         
#>  9 -1.29   -0.407  -1.33  -1.12   o         
#> 10  1.25    2.18   -1.20   1.01   k         
#> # ℹ 510 more rows