Skip to contents

This function downsamples the number of cells in a `tof_tbl` using the one of three methods (randomly sampling a constant number of cells, randomly sampling a proportion of cells, or performing density-dependent downsampling per the algorithm in Qiu et al., (2011)).

Usage

tof_downsample(
  tof_tibble,
  group_cols = NULL,
  ...,
  method = c("constant", "prop", "density")
)

Arguments

tof_tibble

A `tof_tbl` or a `tibble`.

group_cols

Unquoted names of the columns in `tof_tibble` that should be used to define groups within which the downsampling will be performed. Supports tidyselect helpers. Defaults to `NULL` (no grouping).

...

Additional arguments to pass to the `tof_downsample_*` function family member corresponding to the chosen method.

method

A string indicating which downsampling method to use: "constant" (the default), "prop", or "density".

Value

A downsampled `tof_tbl` with the same number of columns as the input `tof_tibble`, but fewer rows. The number of rows in the result will depend on the chosen downsampling method.

See also

Examples

sim_data <-
    dplyr::tibble(
        cd45 = rnorm(n = 1000),
        cd38 = rnorm(n = 1000),
        cd34 = rnorm(n = 1000),
        cd19 = rnorm(n = 1000),
        cluster_id = sample(letters, size = 1000, replace = TRUE)
    )

# sample 200 cells from the input data
tof_downsample(
    tof_tibble = sim_data,
    num_cells = 200L,
    method = "constant"
)
#> # A tibble: 200 × 5
#>      cd45   cd38    cd34    cd19 cluster_id
#>     <dbl>  <dbl>   <dbl>   <dbl> <chr>     
#>  1 -0.429 -0.127 -0.242   0.0885 s         
#>  2  0.249 -2.16  -0.400   1.07   p         
#>  3  1.13   0.676 -1.34    0.464  y         
#>  4  1.66   0.631  0.231   0.447  s         
#>  5 -0.974 -0.219  1.31   -0.466  d         
#>  6  0.815  1.55  -0.558  -1.07   p         
#>  7 -0.225  1.28  -0.976   1.28   e         
#>  8  0.730  0.710  0.571  -0.0914 u         
#>  9 -1.59  -0.337  0.0350 -1.61   a         
#> 10 -0.938  0.593 -0.669   0.0187 p         
#> # ℹ 190 more rows

# sample 10% of all cells from the input data
tof_downsample(
    tof_tibble = sim_data,
    prop_cells = 0.1,
    method = "prop"
)
#> # A tibble: 100 × 5
#>      cd45   cd38   cd34    cd19 cluster_id
#>     <dbl>  <dbl>  <dbl>   <dbl> <chr>     
#>  1  0.571 -0.121  1.54   0.173  z         
#>  2  1.07   0.320 -0.675  0.458  h         
#>  3 -0.786 -1.08  -0.439 -1.21   m         
#>  4  0.285  1.68  -0.580 -0.189  r         
#>  5  0.699 -0.107 -1.21  -3.14   c         
#>  6 -0.270 -1.42   1.65   1.80   a         
#>  7 -0.252 -0.396  0.339  0.602  p         
#>  8  1.31  -1.42   0.234 -0.0494 q         
#>  9 -1.84  -1.86   2.28   0.0512 e         
#> 10 -0.128 -1.28   1.47  -0.892  q         
#> # ℹ 90 more rows

# sample ~10% of cells from the input data using density dependence
tof_downsample(
    tof_tibble = sim_data,
    target_prop_cells = 0.1,
    method = "density"
)
#> # A tibble: 105 × 5
#>      cd45    cd38    cd34    cd19 cluster_id
#>     <dbl>   <dbl>   <dbl>   <dbl> <chr>     
#>  1 -0.974 -0.219   1.31   -0.466  d         
#>  2  0.664  1.39    0.898  -0.675  z         
#>  3 -1.59  -0.337   0.0350 -1.61   a         
#>  4 -0.783  0.203   0.554   1.60   j         
#>  5 -1.33   0.434  -0.0858  0.286  e         
#>  6  0.717 -0.0751 -0.544   0.0304 p         
#>  7  1.00   0.239   1.61    1.72   j         
#>  8 -0.316  0.834  -0.108   1.97   w         
#>  9  0.177 -1.45    0.202   0.498  o         
#> 10 -0.720 -1.03   -0.962   1.49   a         
#> # ℹ 95 more rows