Skip to contents

This function performs k-means clustering on high-dimensional cytometry data using a user-specified selection of input variables/high-dimensional cytometry measurements. It is mostly a convenient wrapper around kmeans.

Usage

tof_cluster_kmeans(
  tof_tibble,
  cluster_cols = where(tof_is_numeric),
  num_clusters = 20,
  ...
)

Arguments

tof_tibble

A `tof_tibble`.

cluster_cols

Unquoted column names indicating which columns in `tof_tibble` to use in computing the k-means clusters. Defaults to all numeric columns in `tof_tibble`. Supports tidyselect helpers.

num_clusters

An integer indicating the maximum number of clusters that should be returned. Defaults to 20.

...

Optional additional arguments that can be passed to kmeans.

Value

A tibble with one column named `.kmeans_cluster`. This column will contain an integer vector of length `nrow(tof_tibble)` indicating the id of the k-means cluster to which each cell (i.e. each row) in `tof_tibble` was assigned.

See also

Examples

sim_data <-
    dplyr::tibble(
        cd45 = rnorm(n = 1000),
        cd38 = rnorm(n = 1000),
        cd34 = rnorm(n = 1000),
        cd19 = rnorm(n = 1000)
    )
tof_cluster_kmeans(tof_tibble = sim_data)
#> # A tibble: 1,000 × 1
#>    .kmeans_cluster
#>    <chr>          
#>  1 8              
#>  2 14             
#>  3 9              
#>  4 5              
#>  5 4              
#>  6 11             
#>  7 5              
#>  8 15             
#>  9 7              
#> 10 5              
#> # ℹ 990 more rows
tof_cluster_kmeans(tof_tibble = sim_data, cluster_cols = c(cd45, cd19))
#> # A tibble: 1,000 × 1
#>    .kmeans_cluster
#>    <chr>          
#>  1 18             
#>  2 5              
#>  3 4              
#>  4 5              
#>  5 2              
#>  6 12             
#>  7 11             
#>  8 6              
#>  9 16             
#> 10 13             
#> # ℹ 990 more rows