Differential Expression Analysis (DEA) with linear mixed-models (LMMs)
Source:R/differential_discovery.R
tof_analyze_expression_lmm.Rd
This function performs differential expression analysis on the cell clusters
contained within a `tof_tbl` using linear mixed-models. Users
specify which columns represent sample, cluster, marker, fixed effect, and random effect
information, and a (mixed) linear regression model is fit using either
lmer
or glm
.
Usage
tof_analyze_expression_lmm(
tof_tibble,
sample_col,
cluster_col,
marker_cols = where(tof_is_numeric),
fixed_effect_cols,
random_effect_cols,
central_tendency_function = median,
min_cells = 3,
min_samples = 5,
alpha = 0.05
)
Arguments
- tof_tibble
A `tof_tbl` or a `tibble`.
- sample_col
An unquoted column name indicating which column in `tof_tibble` represents the id of the sample from which each cell was collected. `sample_col` should serve as a unique identifier for each sample collected during data acquisition - all cells with the same value for `sample_col` will be treated as a part of the same observational unit.
- cluster_col
An unquoted column name indicating which column in `tof_tibble` stores the cluster ids of the cluster to which each cell belongs. Cluster labels can be produced via any method the user chooses - including manual gating, any of the functions in the `tof_cluster_*` function family, or any other method.
- marker_cols
Unquoted column names representing which columns in `tof_tibble` (i.e. which high-dimensional cytometry protein measurements) should be included in the differential discovery analysis. Defaults to all numeric (integer or double) columns. Supports tidyselection.
- fixed_effect_cols
Unquoted column names representing which columns in `tof_tibble` should be used to model fixed effects during the differential expression analysis. Supports tidyselection.
Generally speaking, fixed effects should represent the comparisons of biological interest (often the the variables manipulated during experiments), such as treated vs. non-treated, before-treatment vs. after-treatment, or healthy vs. non-healthy.
- random_effect_cols
Optional. Unquoted column names representing which columns in `tof_tibble` should be used to model random effects during the differential expression analysis. Supports tidyselection.
Generally speaking, random effects should represent variables that a researcher wants to control/account for, but that are not necessarily of biological interest. Example random effect variables might include batch id, patient id (in a paired design), or patient age. Most analyses will not include random effects.
- central_tendency_function
The function that will be used to calculate the measurement of central tendency for each cluster/marker pair (to be used as the dependent variable in the linear model). Defaults to
median
.- min_cells
An integer value used to filter clusters out of the differential expression analysis. Clusters are not included in the differential expression testing if they do not have at least `min_cells` in at least `min_samples` samples. Defaults to 3.
- min_samples
An integer value used to filter clusters out of the differential expression analysis. Clusters are not included in the differential expression testing if they do not have at least `min_cells` in at least `min_samples` samples. Defaults to 5.
- alpha
A numeric value between 0 and 1 indicating which significance level should be applied to multiple-comparison adjusted p-values during the differential abundance analysis. Defaults to 0.05.
Value
A nested tibble with two columns: `tested_effect` and `dea_results`.
The first column, `tested_effect` is a character vector indicating which term in the differential expression model was used for significance testing. The values in this row are obtained by pasting together the column names for each fixed effect variable and each of its values. For example, a fixed effect column named fixed_effect with levels "a", "b", and "c" have two terms in `tested_effect`: "fixed_effectb" and "fixed_effectc" (note that level "a" of fixed_effect is set as the reference level during dummy coding). These values correspond to the terms in the differential expression model that represent the difference in cluster median expression values of each marker between samples with fixed_effect = "b" and fixed_effect = "a" and between samples with fixed_effect = "c" and fixed_effect = "a", respectively. In addition, note that the first row in `tested_effect` will always represent the "omnibus" test, or the test that there were significant differences between any levels of any fixed effect variable in the model.
The second column, `dea_results` is a list of tibbles in which each entry gives
the differential expression results for each tested_effect. Within each entry
of `daa_results`, you will find `p_val`, the p-value associated with each
tested effect in each input cluster/marker pair; `p_adj`, the multiple-comparison
adjusted p-value (using the p.adjust
function), and
other values associated with the underlying method used to perform the
differential expression analysis (such as the log-fold change of clusters' median
marker expression values between the levels being compared).
Details
Specifically, one linear model is fit for each cluster/marker pair. For each cluster/marker
pair, a user-supplied measurement of central tendency (`central_tendency_function`), such
as mean or median, is calculated across all cells in the cluster on a sample-by-sample
basis. Then, this central tendency value is used as the dependent variable in a
linear model with `fixed_effect_cols` as fixed effects predictors and `random_effect_cols`
as random effects predictors. Once all models (one per each cluster/marker pair) are fit,
p-values for each coefficient in each model are multiple-comparisons adjusted using the
p.adjust
function.
See also
Other differential expression analysis functions:
tof_analyze_expression()
,
tof_analyze_expression_diffcyt()
,
tof_analyze_expression_ttest()