This function downsamples the number of cells in a `tof_tbl` using the one of three methods (randomly sampling a constant number of cells, randomly sampling a proportion of cells, or performing density-dependent downsampling per the algorithm in Qiu et al., (2011)).
Usage
tof_downsample(
tof_tibble,
group_cols = NULL,
...,
method = c("constant", "prop", "density")
)
Arguments
- tof_tibble
A `tof_tbl` or a `tibble`.
- group_cols
Unquoted names of the columns in `tof_tibble` that should be used to define groups within which the downsampling will be performed. Supports tidyselect helpers. Defaults to `NULL` (no grouping).
- ...
Additional arguments to pass to the `tof_downsample_*` function family member corresponding to the chosen method.
- method
A string indicating which downsampling method to use: "constant" (the default), "prop", or "density".
Value
A downsampled `tof_tbl` with the same number of columns as the input `tof_tibble`, but fewer rows. The number of rows in the result will depend on the chosen downsampling method.
See also
Other downsampling functions:
tof_downsample_constant()
,
tof_downsample_density()
,
tof_downsample_prop()
Examples
sim_data <-
dplyr::tibble(
cd45 = rnorm(n = 1000),
cd38 = rnorm(n = 1000),
cd34 = rnorm(n = 1000),
cd19 = rnorm(n = 1000),
cluster_id = sample(letters, size = 1000, replace = TRUE)
)
# sample 200 cells from the input data
tof_downsample(
tof_tibble = sim_data,
num_cells = 200L,
method = "constant"
)
#> # A tibble: 200 × 5
#> cd45 cd38 cd34 cd19 cluster_id
#> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 -0.429 -0.127 -0.242 0.0885 s
#> 2 0.249 -2.16 -0.400 1.07 p
#> 3 1.13 0.676 -1.34 0.464 y
#> 4 1.66 0.631 0.231 0.447 s
#> 5 -0.974 -0.219 1.31 -0.466 d
#> 6 0.815 1.55 -0.558 -1.07 p
#> 7 -0.225 1.28 -0.976 1.28 e
#> 8 0.730 0.710 0.571 -0.0914 u
#> 9 -1.59 -0.337 0.0350 -1.61 a
#> 10 -0.938 0.593 -0.669 0.0187 p
#> # ℹ 190 more rows
# sample 10% of all cells from the input data
tof_downsample(
tof_tibble = sim_data,
prop_cells = 0.1,
method = "prop"
)
#> # A tibble: 100 × 5
#> cd45 cd38 cd34 cd19 cluster_id
#> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 0.571 -0.121 1.54 0.173 z
#> 2 1.07 0.320 -0.675 0.458 h
#> 3 -0.786 -1.08 -0.439 -1.21 m
#> 4 0.285 1.68 -0.580 -0.189 r
#> 5 0.699 -0.107 -1.21 -3.14 c
#> 6 -0.270 -1.42 1.65 1.80 a
#> 7 -0.252 -0.396 0.339 0.602 p
#> 8 1.31 -1.42 0.234 -0.0494 q
#> 9 -1.84 -1.86 2.28 0.0512 e
#> 10 -0.128 -1.28 1.47 -0.892 q
#> # ℹ 90 more rows
# sample ~10% of cells from the input data using density dependence
tof_downsample(
tof_tibble = sim_data,
target_prop_cells = 0.1,
method = "density"
)
#> # A tibble: 105 × 5
#> cd45 cd38 cd34 cd19 cluster_id
#> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 -0.974 -0.219 1.31 -0.466 d
#> 2 0.664 1.39 0.898 -0.675 z
#> 3 -1.59 -0.337 0.0350 -1.61 a
#> 4 -0.783 0.203 0.554 1.60 j
#> 5 -1.33 0.434 -0.0858 0.286 e
#> 6 0.717 -0.0751 -0.544 0.0304 p
#> 7 1.00 0.239 1.61 1.72 j
#> 8 -0.316 0.834 -0.108 1.97 w
#> 9 0.177 -1.45 0.202 0.498 o
#> 10 -0.720 -1.03 -0.962 1.49 a
#> # ℹ 95 more rows