This function downsamples the number of cells in a `tof_tbl` using the one of three methods (randomly sampling a constant number of cells, randomly sampling a proportion of cells, or performing density-dependent downsampling per the algorithm in Qiu et al., (2011)).
Usage
tof_downsample(
tof_tibble,
group_cols = NULL,
...,
method = c("constant", "prop", "density")
)
Arguments
- tof_tibble
A `tof_tbl` or a `tibble`.
- group_cols
Unquoted names of the columns in `tof_tibble` that should be used to define groups within which the downsampling will be performed. Supports tidyselect helpers. Defaults to `NULL` (no grouping).
- ...
Additional arguments to pass to the `tof_downsample_*` function family member corresponding to the chosen method.
- method
A string indicating which downsampling method to use: "constant" (the default), "prop", or "density".
Value
A downsampled `tof_tbl` with the same number of columns as the input `tof_tibble`, but fewer rows. The number of rows in the result will depend on the chosen downsampling method.
See also
Other downsampling functions:
tof_downsample_constant()
,
tof_downsample_density()
,
tof_downsample_prop()
Examples
sim_data <-
dplyr::tibble(
cd45 = rnorm(n = 1000),
cd38 = rnorm(n = 1000),
cd34 = rnorm(n = 1000),
cd19 = rnorm(n = 1000),
cluster_id = sample(letters, size = 1000, replace = TRUE)
)
# sample 200 cells from the input data
tof_downsample(
tof_tibble = sim_data,
num_cells = 200L,
method = "constant"
)
#> # A tibble: 200 × 5
#> cd45 cd38 cd34 cd19 cluster_id
#> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 -1.48 -1.24 -0.907 -2.16 q
#> 2 1.58 -0.388 1.21 -1.31 q
#> 3 0.858 -0.394 0.204 -0.432 l
#> 4 1.04 -1.07 0.359 0.169 g
#> 5 -0.592 -0.976 2.04 -0.581 a
#> 6 1.33 -0.271 0.0306 -0.183 o
#> 7 -0.462 0.953 0.650 1.71 h
#> 8 1.49 -0.326 2.00 0.851 s
#> 9 1.15 -0.681 -1.82 -0.720 t
#> 10 -0.618 -1.49 -0.954 -1.90 c
#> # ℹ 190 more rows
# sample 10% of all cells from the input data
tof_downsample(
tof_tibble = sim_data,
prop_cells = 0.1,
method = "prop"
)
#> # A tibble: 100 × 5
#> cd45 cd38 cd34 cd19 cluster_id
#> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 -2.57 -1.17 1.37 -0.400 v
#> 2 0.564 -1.23 -0.198 -0.832 k
#> 3 -0.410 0.913 0.979 0.144 o
#> 4 -0.586 0.963 1.37 1.01 k
#> 5 -0.727 -1.26 0.907 -1.75 m
#> 6 1.31 -0.416 -0.575 0.102 j
#> 7 -0.572 -0.672 -0.336 -0.910 r
#> 8 -0.0373 0.361 -0.425 -1.15 k
#> 9 -1.43 -0.923 0.913 -1.21 h
#> 10 -0.537 1.09 0.460 -0.937 x
#> # ℹ 90 more rows
# sample ~10% of cells from the input data using density dependence
tof_downsample(
tof_tibble = sim_data,
target_prop_cells = 0.1,
method = "density"
)
#> # A tibble: 101 × 5
#> cd45 cd38 cd34 cd19 cluster_id
#> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 -2.68 -0.976 -1.02 -0.607 j
#> 2 -0.238 0.567 0.270 -1.33 h
#> 3 1.44 -0.963 -0.707 -2.41 d
#> 4 -0.0581 0.462 -0.916 -1.18 h
#> 5 0.307 0.591 -1.48 -0.549 l
#> 6 -0.523 -1.33 -1.19 -0.456 t
#> 7 0.693 0.901 0.424 1.18 m
#> 8 -0.106 1.35 -0.159 0.315 n
#> 9 0.204 1.09 0.189 -0.810 q
#> 10 0.0685 -0.0127 -1.31 0.187 w
#> # ℹ 91 more rows