Manually annotate tidytof-computed clusters using user-specified labels
Source:R/clustering.R
tof_annotate_clusters.Rd
This function adds an additional column to a `tibble` or `tof_tbl` to allow users to incorporate manual cell type labels for clusters identified using unsupervised algorithms.
Arguments
- tof_tibble
`tof_tbl` or `tibble`.
- cluster_col
An unquoted column name indicating which column in `tof_tibble` contains the ids of the unsupervised cluster to which each cell belongs. Cluster labels can be produced via any method the user chooses - including manual gating, any of the functions in the `tof_cluster_*` function family, or any other method.
- annotations
A data structure indicating how to annotate each cluster id in `cluster_col`. `annotations` can be provided as a data.frame with two columns (the first should have the same name as `cluster_col` and contain each unique cluster id; the second can have any name and should contain a character vector indicating which manual annotation should be matched with each cluster id in the first column). `annotations` can also be provided as a named character vector; in this case, each entry in `annotations` should be a unique cluster id, and the names for each entry should be the corresponding manual cluster annotation. See below for examples.
Value
A `tof_tbl` with the same number of rows as `tof_tibble` and one additional column containing the manual cluster annotations for each cell (as a character vector). If `annotations` was provided as a data.frame, the new column will have the same name as the column containing the cluster annotations in `annotations`. If `annotations` was provided as a named character vector, the new column will be named `{cluster_col}_annotation`.
Examples
sim_data <-
dplyr::tibble(
cd45 = rnorm(n = 1000),
cd38 = c(rnorm(n = 500), rnorm(n = 500, mean = 2)),
cd34 = c(rnorm(n = 500), rnorm(n = 500, mean = 4)),
cd19 = rnorm(n = 1000),
cluster_id = c(rep("a", 500), rep("b", 500))
)
# using named character vector
sim_data |>
tof_annotate_clusters(
cluster_col = cluster_id,
annotations = c("macrophage" = "a", "dendritic cell" = "b")
)
#> # A tibble: 1,000 × 6
#> cd45 cd38 cd34 cd19 cluster_id cluster_id_annotation
#> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 -1.40 -0.337 -0.166 1.12 a macrophage
#> 2 0.255 -0.216 0.120 0.400 a macrophage
#> 3 -2.44 0.621 -0.662 -0.985 a macrophage
#> 4 -0.00557 -1.28 -0.531 -0.503 a macrophage
#> 5 0.622 -1.30 -0.301 0.987 a macrophage
#> 6 1.15 -0.377 -0.602 2.19 a macrophage
#> 7 -1.82 0.104 -0.318 -0.165 a macrophage
#> 8 -0.247 -0.704 0.308 -0.686 a macrophage
#> 9 -0.244 1.50 0.799 0.941 a macrophage
#> 10 -0.283 -0.303 1.75 -0.164 a macrophage
#> # ℹ 990 more rows
# using two-column data.frame
annotation_data_frame <-
data.frame(
cluster_id = c("a", "b"),
cluster_annotation = c("macrophage", "dendritic cell")
)
sim_data |>
tof_annotate_clusters(
cluster_col = cluster_id,
annotations = annotation_data_frame
)
#> # A tibble: 1,000 × 6
#> cd45 cd38 cd34 cd19 cluster_id cluster_annotation
#> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 -1.40 -0.337 -0.166 1.12 a macrophage
#> 2 0.255 -0.216 0.120 0.400 a macrophage
#> 3 -2.44 0.621 -0.662 -0.985 a macrophage
#> 4 -0.00557 -1.28 -0.531 -0.503 a macrophage
#> 5 0.622 -1.30 -0.301 0.987 a macrophage
#> 6 1.15 -0.377 -0.602 2.19 a macrophage
#> 7 -1.82 0.104 -0.318 -0.165 a macrophage
#> 8 -0.247 -0.704 0.308 -0.686 a macrophage
#> 9 -0.244 1.50 0.799 0.941 a macrophage
#> 10 -0.283 -0.303 1.75 -0.164 a macrophage
#> # ℹ 990 more rows