Skip to contents

This feature extraction function allows you to calculate the proportion of cells in each cluster in a `tof_tibble` - either overall or when broken down into subgroups using `group_cols`.

Usage

tof_extract_proportion(
  tof_tibble,
  cluster_col,
  group_cols = NULL,
  format = c("wide", "long")
)

Arguments

tof_tibble

A `tof_tbl` or a `tibble`.

cluster_col

An unquoted column name indicating which column in `tof_tibble` stores the cluster ids of the cluster to which each cell belongs. Cluster labels can be produced via any method the user chooses - including manual gating, any of the functions in the `tof_cluster_*` function family, or any other method.

group_cols

Unquoted column names representing which columns in `tof_tibble` should be used to break the rows of `tof_tibble` into subgroups for the feature extraction calculation. Defaults to NULL (i.e. performing the extraction without subgroups).

format

A string indicating if the data should be returned in "wide" format (the default; each cluster proportion is given its own column) or in "long" format (each cluster proportion is provided as its own row).

Value

A tibble.

If format == "wide", the tibble will have 1 row for each combination of the grouping variables provided in `group_cols` and one column for each grouping variable as well as one column for the proportion of cells in each cluster. The names of each column containing cluster proportions is obtained using the following pattern: "prop@{cluster_id}".

If format == "long", the tibble will have 1 row for each combination of the grouping variables in `group_cols` and each cluster id (i.e. level) in `cluster_col`. It will have one column for each grouping variable, one column for the cluster ids, and one column (`prop`) containing the cluster proportions.

Examples

sim_data <-
    dplyr::tibble(
        cd45 = rnorm(n = 1000),
        cd38 = rnorm(n = 1000),
        cd34 = rnorm(n = 1000),
        cd19 = rnorm(n = 1000),
        cluster_id = sample(letters, size = 1000, replace = TRUE),
        patient = sample(c("kirby", "mario"), size = 1000, replace = TRUE),
        stim = sample(c("basal", "stim"), size = 1000, replace = TRUE)
    )

# extract proportion of each cluster in each patient in wide format
tof_extract_proportion(
    tof_tibble = sim_data,
    cluster_col = cluster_id,
    group_cols = patient
)
#> # A tibble: 2 × 27
#>   patient `prop@a` `prop@b` `prop@c` `prop@d` `prop@e` `prop@f` `prop@g`
#>   <chr>      <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
#> 1 kirby     0.0313   0.0438   0.0459   0.0480   0.0292   0.0376   0.0292
#> 2 mario     0.0384   0.0173   0.0441   0.0384   0.0403   0.0441   0.0288
#> # ℹ 19 more variables: `prop@h` <dbl>, `prop@i` <dbl>, `prop@j` <dbl>,
#> #   `prop@k` <dbl>, `prop@l` <dbl>, `prop@m` <dbl>, `prop@n` <dbl>,
#> #   `prop@o` <dbl>, `prop@p` <dbl>, `prop@q` <dbl>, `prop@r` <dbl>,
#> #   `prop@s` <dbl>, `prop@t` <dbl>, `prop@u` <dbl>, `prop@v` <dbl>,
#> #   `prop@w` <dbl>, `prop@x` <dbl>, `prop@y` <dbl>, `prop@z` <dbl>

# extract proportion of each cluster in each patient in long format
tof_extract_proportion(
    tof_tibble = sim_data,
    cluster_col = cluster_id,
    group_cols = patient,
    format = "long"
)
#> # A tibble: 52 × 3
#>    patient cluster_id   prop
#>    <chr>   <chr>       <dbl>
#>  1 kirby   a          0.0313
#>  2 kirby   b          0.0438
#>  3 kirby   c          0.0459
#>  4 kirby   d          0.0480
#>  5 kirby   e          0.0292
#>  6 kirby   f          0.0376
#>  7 kirby   g          0.0292
#>  8 kirby   h          0.0292
#>  9 kirby   i          0.0334
#> 10 kirby   j          0.0271
#> # ℹ 42 more rows