Metacluster clustered CyTOF data using consensus clustering
Source:R/metaclustering.R
tof_metacluster_consensus.Rd
This function performs consensus metaclustering on a `tof_tbl` containing CyTOF data
using a user-specified selection of input variables/CyTOF measurements and
the number of desired metaclusters.
See ConsensusClusterPlus
for additional
details.
Usage
tof_metacluster_consensus(
tof_tibble,
cluster_col,
metacluster_cols = where(tof_is_numeric),
central_tendency_function = stats::median,
num_metaclusters = 10L,
proportion_clusters = 0.9,
proportion_features = 1,
num_reps = 20L,
clustering_algorithm = c("hierarchical", "pam", "kmeans"),
distance_function = c("euclidean", "minkowski", "pearson", "spearman", "maximum",
"binary", "canberra"),
...
)
Arguments
- tof_tibble
A `tof_tbl` or `tibble`.
- cluster_col
An unquoted column name indicating which column in `tof_tibble` stores the cluster ids for the cluster to which each cell belongs. Cluster labels can be produced via any method the user chooses - including manual gating, any of the functions in the `tof_cluster_*` function family, or any other method.
- metacluster_cols
Unquoted column names indicating which columns in `tof_tibble` to use in computing the metaclusters. Defaults to all numeric columns in `tof_tibble`. Supports tidyselect helpers.
- central_tendency_function
The function that should be used to calculate the measurement of central tendency for each cluster before metaclustering. This function will be used to compute a summary statistic for each input cluster in `cluster_col` across all columns specified by `metacluster_cols`, and the resulting vector (one for each cluster) will be used as the input for metaclustering. Defaults to
median
.- num_metaclusters
An integer indicating the number of clusters that should be returned. Defaults to 10.
- proportion_clusters
A numeric value between 0 and 1 indicating the proportion of clusters to subsample (from the total number of clusters in `cluster_col`) during each iteration of the consensus clustering. Defaults to 0.9
- proportion_features
A numeric value between 0 and 1 indicating the proportion of features (i.e. the proportion of columns specified by `metacluster_cols`) to subsample during each iteration of the consensus clustering. Defaults to 1 (all features are included).
- num_reps
An integer indicating how many subsampled replicates to run during consensus clustering. Defaults to 20.
- clustering_algorithm
A string indicating which clustering algorithm
ConsensusClusterPlus
should use to metacluster the subsampled clusters during each resampling. Options are "hierarchical" (the default), "pam" (partitioning around medoids), and "kmeans".- distance_function
A string indicating which distance function should be used to compute the distances between clusters during consensus clustering. Options are "euclidean" (the default), "manhattan", "minkowski", "pearson", "spearman", "maximum", "binary", and "canberra". See
ConsensusClusterPlus
.- ...
Optional additional arguments to pass to
ConsensusClusterPlus
.
Value
A tibble with a single column (`.consensus_metacluster`) and the same number of rows as the input `tof_tibble`. Each entry in the column indicates the metacluster label assigned to the same row in `tof_tibble`.
See also
Other metaclustering functions:
tof_metacluster()
,
tof_metacluster_flowsom()
,
tof_metacluster_hierarchical()
,
tof_metacluster_kmeans()
,
tof_metacluster_phenograph()
Examples
sim_data <-
dplyr::tibble(
cd45 = rnorm(n = 1000),
cd38 = rnorm(n = 1000),
cd34 = rnorm(n = 1000),
cd19 = rnorm(n = 1000),
cluster_id = sample(letters, size = 1000, replace = TRUE)
)
tof_metacluster_consensus(tof_tibble = sim_data, cluster_col = cluster_id)
#> # A tibble: 1,000 × 1
#> .consensus_metacluster
#> <chr>
#> 1 5
#> 2 5
#> 3 1
#> 4 7
#> 5 4
#> 6 7
#> 7 5
#> 8 4
#> 9 2
#> 10 5
#> # ℹ 990 more rows