Skip to contents

This function performs differential expression analysis on the cell clusters contained within a `tof_tbl` using simple t-tests. Specifically, either an unpaired or paired t-test will compare samples' marker expression distributions (between two conditions) within each cluster using a user-specified summary function (i.e. mean or median). One t-test is conducted per cluster/marker pair and significant differences between sample types are detected after multiple-hypothesis correction.

Usage

tof_analyze_expression_ttest(
  tof_tibble,
  cluster_col,
  marker_cols = where(tof_is_numeric),
  effect_col,
  group_cols,
  test_type = c("unpaired", "paired"),
  summary_function = mean,
  min_cells = 3,
  min_samples = 5,
  alpha = 0.05,
  quiet = FALSE
)

Arguments

tof_tibble

A `tof_tbl` or a `tibble`.

cluster_col

An unquoted column name indicating which column in `tof_tibble` stores the cluster ids of the cluster to which each cell belongs. Cluster labels can be produced via any method the user chooses - including manual gating, any of the functions in the `tof_cluster_*` function family, or any other method.

marker_cols

Unquoted column names representing which columns in `tof_tibble` (i.e. which high-dimensional cytometry protein measurements) should be tested for differential expression between levels of the `effect_col`. Defaults to all numeric (integer or double) columns. Supports tidyselect helpers.

effect_col

Unquoted column name representing which column in `tof_tibble` should be used to break samples into groups for the t-test. Should only have 2 unique values.

group_cols

Unquoted names of the columns other than `effect_col` that should be used to group cells into independent observations. Fills a similar role to `sample_col` in other `tof_analyze_abundance_*` functions. For example, if an experiment involves analyzing samples taken from multiple patients at two timepoints (with `effect_col = timepoint`), then group_cols should be the name of the column representing patient IDs.

test_type

A string indicating whether the t-test should be "unpaired" (the default) or "paired".

summary_function

The vector-valued function that should be used to summarize the distribution of each marker in each cluster (within each sample, as grouped by `group_cols`). Defaults to `mean`.

min_cells

An integer value used to filter clusters out of the differential abundance analysis. Clusters are not included in the differential abundance testing if they do not have at least `min_cells` in at least `min_samples` samples. Defaults to 3.

min_samples

An integer value used to filter clusters out of the differential abundance analysis. Clusters are not included in the differential abundance testing if they do not have at least `min_cells` in at least `min_samples` samples. Defaults to 5.

alpha

A numeric value between 0 and 1 indicating which significance level should be applied to multiple-comparison adjusted p-values during the differential abundance analysis. Defaults to 0.05.

quiet

A boolean value indicating whether warnings should be printed. Defaults to `TRUE`.

Value

A tibble with 7 columns:

{cluster_col}

The name/ID of the cluster in the cluster/marker pair being tested. Each entry in this column will match a unique value in the input {cluster_col}.

marker

The name of the marker in the cluster/marker pair being tested.

t

The t-statistic computed for each cluster.

df

The degrees of freedom used for the t-test for each cluster.

p_val

The (unadjusted) p-value for the t-test for each cluster.

p_adj

The p.adjust-adjusted p-value for the t-test for each cluster.

significant

A character vector that will be "*" for clusters for which p_adj < alpha and "" otherwise.

mean_diff

For an unpaired t-test, the difference between the average proportions of each cluster in the two levels of `effect_col`. For a paired t-test, the average difference between the proportions of each cluster in the two levels of `effect_col` within a given patient.

mean_fc

For an unpaired t-test, the ratio between the average proportions of each cluster in the two levels of `effect_col`. For a paired t-test, the average ratio between the proportions of each cluster in the two levels of `effect_col` within a given patient. 0.001 is added to the denominator of the ratio to avoid divide-by-zero errors.

The "levels" attribute of the result indicates the order in which the different levels of the `effect_col` were considered. The `mean_diff` value for each row of the output is computed subtracting the second level from the first level, and the `mean_fc` value for each row is computed by dividing the first level by the second level.

See also

Other differential expression analysis functions: tof_analyze_expression(), tof_analyze_expression_diffcyt(), tof_analyze_expression_lmm()

Examples

# For differential discovery examples, please see the package vignettes
NULL
#> NULL