Downsample high-dimensional cytometry data by randomly selecting a constant number of cells per group.
Source:R/downsampling.R
tof_downsample_constant.Rd
This function downsamples the number of cells in a `tof_tbl` by randomly selecting `num_cells` cells from each unique combination of values in `group_cols`.
Arguments
- tof_tibble
A `tof_tbl` or a `tibble`.
- group_cols
Unquoted names of the columns in `tof_tibble` that should be used to define groups from which `num_cells` will be downsampled. Supports tidyselect helpers. Defaults to `NULL` (no grouping).
- num_cells
An integer number of cells that should be sampled from each group defined by `group_cols`.
Value
A `tof_tbl` with the same number of columns as the input `tof_tibble`, but fewer rows. Specifically, the number of rows will be `num_cells` multiplied by the number of unique combinations of the values in `group_cols`. If any group has fewer than `num_cells` number of cells, all cells from that group will be kept.
See also
Other downsampling functions:
tof_downsample()
,
tof_downsample_density()
,
tof_downsample_prop()
Examples
sim_data <-
dplyr::tibble(
cd45 = rnorm(n = 1000),
cd38 = rnorm(n = 1000),
cd34 = rnorm(n = 1000),
cd19 = rnorm(n = 1000),
cluster_id = sample(letters, size = 1000, replace = TRUE)
)
# sample 500 cells from the input data
tof_downsample_constant(
tof_tibble = sim_data,
num_cells = 500L
)
#> # A tibble: 500 × 5
#> cd45 cd38 cd34 cd19 cluster_id
#> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 -0.100 -0.0492 0.282 0.865 l
#> 2 0.199 -1.22 -0.670 0.685 s
#> 3 -1.46 0.855 1.02 -0.0620 e
#> 4 1.18 -0.316 -2.67 -0.113 q
#> 5 0.837 0.345 0.303 -1.44 u
#> 6 0.566 -0.0197 -0.764 0.255 g
#> 7 -0.460 0.287 -1.11 1.09 d
#> 8 0.105 1.79 2.01 -0.486 e
#> 9 1.90 -0.971 0.278 -0.0794 h
#> 10 1.38 0.302 -0.186 0.296 g
#> # ℹ 490 more rows
# sample 20 cells per cluster from the input data
tof_downsample_constant(
tof_tibble = sim_data,
group_cols = cluster_id,
num_cells = 20L
)
#> # A tibble: 520 × 5
#> cd45 cd38 cd34 cd19 cluster_id
#> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 -0.0611 0.244 1.33 -0.653 y
#> 2 -0.328 1.14 0.231 -0.620 p
#> 3 0.578 -0.318 -0.954 0.135 x
#> 4 -1.46 0.855 1.02 -0.0620 e
#> 5 0.566 -0.0197 -0.764 0.255 g
#> 6 -0.153 0.909 -0.456 0.267 r
#> 7 -0.0409 -0.556 2.49 0.912 f
#> 8 -0.817 -1.64 -2.27 -0.981 z
#> 9 -1.29 -0.407 -1.33 -1.12 o
#> 10 1.25 2.18 -1.20 1.01 k
#> # ℹ 510 more rows