Skip to contents

This function calculates principal components using single-cell data from a `tof_tibble`.

Usage

tof_reduce_pca(
  tof_tibble,
  pca_cols = where(tof_is_numeric),
  num_comp = 5,
  threshold = NA,
  center = TRUE,
  scale = TRUE,
  return_recipe = FALSE
)

Arguments

tof_tibble

A `tof_tbl` or `tibble`.

pca_cols

Unquoted column names indicating which columns in `tof_tibble` to use for computing the principal components. Defaults to all numeric columns. Supports tidyselect helpers.

num_comp

The number of PCA components to calculate. Defaults to 5. See step_pca.

threshold

A double between 0 and 1 representing the fraction of total variance that should be covered by the components returned in the output. See step_pca.

center

A boolean value indicating if each column should be centered to mean 0 before PCA analysis. Defaults to TRUE.

scale

A boolean value indicating if each column should be scaled to standard deviation = 1 before PCA analysis. Defaults to TRUE.

return_recipe

A boolean value indicating if instead of the UMAP result, a prepped recipe object containing the PCA embedding should be returned. Set this option to TRUE if you want to create the PCA embedding using one dataset but also want to project new observations onto the same embedding space later.

Value

A tibble with the same number of rows as `tof_tibble`, each representing a single cell. Each of the `num_comp` columns represents each cell's embedding in the calculated principal component space.

See also

Other dimensionality reduction functions: tof_reduce_dimensions(), tof_reduce_tsne(), tof_reduce_umap()

Examples

# simulate single-cell data
sim_data <-
    dplyr::tibble(
        cd45 = rnorm(n = 200),
        cd38 = rnorm(n = 200),
        cd34 = rnorm(n = 200),
        cd19 = rnorm(n = 200)
    )
new_data <-
    dplyr::tibble(
        cd45 = rnorm(n = 50),
        cd38 = rnorm(n = 50),
        cd34 = rnorm(n = 50),
        cd19 = rnorm(n = 50)
    )

# calculate pca
tof_reduce_pca(tof_tibble = sim_data, num_comp = 2)
#> # A tibble: 200 × 2
#>       .pc1    .pc2
#>      <dbl>   <dbl>
#>  1 -1.18    2.40  
#>  2  0.432   0.529 
#>  3  0.0764  0.761 
#>  4  3.09    0.0319
#>  5 -0.351  -1.84  
#>  6  0.619  -0.0466
#>  7  0.561   0.211 
#>  8  1.15    2.01  
#>  9  0.807   0.217 
#> 10 -0.647  -0.278 
#> # ℹ 190 more rows

# return recipe instead of embeddings
pca_recipe <- tof_reduce_pca(tof_tibble = sim_data, return_recipe = TRUE)

# apply recipe to new data
recipes::bake(pca_recipe, new_data = new_data)
#> # A tibble: 50 × 4
#>         PC1    PC2    PC3     PC4
#>       <dbl>  <dbl>  <dbl>   <dbl>
#>  1 -0.260   -1.04  -0.118 -0.598 
#>  2  0.00176 -0.537 -0.146  0.125 
#>  3 -0.119   -0.173 -0.500  1.84  
#>  4 -0.717    1.42   1.63  -1.67  
#>  5 -0.793   -1.08   0.397 -0.773 
#>  6  0.980    0.560 -0.946 -1.37  
#>  7  1.02    -0.372  0.720 -2.35  
#>  8 -0.740   -0.226  1.21   0.0807
#>  9 -1.32     1.68   0.376 -0.573 
#> 10 -0.0621  -0.574  0.941  0.179 
#> # ℹ 40 more rows