Skip to contents

This function downsamples the number of cells in a `tof_tbl` by randomly selecting a `prop_cells` proportion of the total number of cells with each unique combination of values in `group_cols`.

Usage

tof_downsample_prop(tof_tibble, group_cols = NULL, prop_cells)

Arguments

tof_tibble

A `tof_tbl` or a `tibble`.

group_cols

Unquoted names of the columns in `tof_tibble` that should be used to define groups from which `prop_cells` will be downsampled. Supports tidyselect helpers. Defaults to `NULL` (no grouping).

prop_cells

A proportion of cells (between 0 and 1) that should be sampled from each group defined by `group_cols`.

Value

A `tof_tbl` with the same number of columns as the input `tof_tibble`, but fewer rows. Specifically, the number of rows should be `prop_cells` times the number of rows in the input `tof_tibble`.

See also

Other downsampling functions: tof_downsample(), tof_downsample_constant(), tof_downsample_density()

Examples

sim_data <-
    dplyr::tibble(
        cd45 = rnorm(n = 1000),
        cd38 = rnorm(n = 1000),
        cd34 = rnorm(n = 1000),
        cd19 = rnorm(n = 1000),
        cluster_id = sample(letters, size = 1000, replace = TRUE)
    )

# sample 10% of all cells from the input data
tof_downsample_prop(
    tof_tibble = sim_data,
    prop_cells = 0.1
)
#> # A tibble: 100 × 5
#>       cd45   cd38   cd34    cd19 cluster_id
#>      <dbl>  <dbl>  <dbl>   <dbl> <chr>     
#>  1  1.01    0.520 -2.18  -0.531  s         
#>  2  0.542   1.50  -0.890 -0.815  a         
#>  3 -0.166   0.513 -0.812  1.31   e         
#>  4  0.116  -1.27   0.359 -0.412  f         
#>  5  0.765  -1.46  -1.71   0.247  u         
#>  6 -0.426  -0.412  1.99   0.465  x         
#>  7 -1.10   -0.437  0.284  0.0682 s         
#>  8 -0.620  -1.81  -0.928 -1.32   k         
#>  9  0.0205  0.386 -0.465 -0.102  m         
#> 10 -0.721   0.362 -0.594 -0.848  h         
#> # ℹ 90 more rows

# sample 10% of all cells from each cluster in the input data
tof_downsample_prop(
    tof_tibble = sim_data,
    group_cols = cluster_id,
    prop_cells = 0.1
)
#> # A tibble: 87 × 5
#>      cd45    cd38   cd34    cd19 cluster_id
#>     <dbl>   <dbl>  <dbl>   <dbl> <chr>     
#>  1 -1.02  -1.06   -1.12   1.63   a         
#>  2 -0.959 -1.57    3.55   0.732  a         
#>  3  0.542  1.50   -0.890 -0.815  a         
#>  4 -0.617  0.0143 -1.23   0.291  a         
#>  5 -1.39   2.26   -0.497 -0.166  b         
#>  6 -0.396 -0.818  -0.720  0.0744 b         
#>  7  0.916 -0.859   1.35   1.37   b         
#>  8  0.438  1.33   -1.18   1.19   c         
#>  9 -0.748  0.436   1.39   0.398  c         
#> 10  0.512 -1.16    0.687 -0.731  c         
#> # ℹ 77 more rows