Skip to contents

This function transforms a `tof_tbl` of raw ion counts, reads, or fluorescence intensity units directly measured on a cytometer using a user-provided function. It can be used to perform standard pre-processing steps (i.e. arcsinh transformation) before cytometry data analysis.

Usage

tof_preprocess(
  tof_tibble = NULL,
  channel_cols = where(tof_is_numeric),
  undo_noise = FALSE,
  transform_fun = function(x) asinh(x/5)
)

Arguments

tof_tibble

A `tof_tbl` or a `tibble`.

channel_cols

Unquoted column names representing columns that contain single-cell protein measurements. Supports tidyselect helpers. If nothing is specified, the default is to transform all numeric columns.

undo_noise

A boolean value indicating whether to remove the uniform noise that Fluidigm software adds to CyTOF measurements for aesthetic and visualization purposes. See this paper. Defaults to FALSE.

transform_fun

A vectorized function to apply to each protein value for variance stabilization. Defaults to asinh transformation (with a co-factor of 5).

Value

A `tof_tbl` with identical dimensions to the input `tof_tibble`, with all columns specified in channel_cols transformed using `transform_fun` (with noise removed or not removed depending on `undo_noise`).

See also

[tof_postprocess()]

Examples


# read in an example .fcs file from tidytof's internal datasets
input_file <- dir(tidytof_example_data("aml"), full.names = TRUE)[[1]]
tof_tibble <- tof_read_data(input_file)

# preprocess all numeric columns with default behavior
# arcsinh transformation with a cofactor of 5
tof_preprocess(tof_tibble)
#> # A tibble: 100 × 59
#>     Time Event_length `CD45|Y89` `empty|Pd102` `empty|Pd104` `empty|Pd105`
#>    <dbl>        <dbl>      <dbl>         <dbl>         <dbl>         <dbl>
#>  1 15.3          1.88       5.33         0.263         1.70           5.85
#>  2 14.9          2.05       5.83         0.731         1.67           5.71
#>  3 15.2          1.88       5.70         1.13          0.861          5.54
#>  4 13.7          1.88       5.45         0.129         1.06           5.46
#>  5 15.2          1.99       5.73         0.721         1.41           5.55
#>  6 14.4          2.05       5.27         0.760         0.708          5.52
#>  7 13.9          1.88       5.31         0.645         0.771          5.42
#>  8 14.2          1.99       5.42         1.09          1.58           5.64
#>  9 15.6          2.05       6.03         0.586         1.37           5.83
#> 10  9.75         1.88       5.38         0.177         1.73           5.78
#> # ℹ 90 more rows
#> # ℹ 53 more variables: `empty|Pd106` <dbl>, `empty|Pd108` <dbl>,
#> #   `empty|Pd110` <dbl>, `CD61|In113` <dbl>, `CD99|In115` <dbl>,
#> #   `empty|I127` <dbl>, `CD45RA|La139` <dbl>, `CD93|Ce140` <dbl>,
#> #   `CD3_CD19|Pr141` <dbl>, `CCR2|Nd142` <dbl>, `CD117|Nd143` <dbl>,
#> #   `CD123|Nd144` <dbl>, `CD64|Nd145` <dbl>, `CD90|Nd146` <dbl>,
#> #   `CD38|Sm147` <dbl>, `CD34|Nd148` <dbl>, `CEBPa|Sm149` <dbl>, …

# preprocess all numeric columns using the log base 10 tranformation
tof_preprocess(tof_tibble, transform_fun = log10)
#> # A tibble: 100 × 59
#>     Time Event_length `CD45|Y89` `empty|Pd102` `empty|Pd104` `empty|Pd105`
#>    <dbl>        <dbl>      <dbl>         <dbl>         <dbl>         <dbl>
#>  1  7.04         1.20       2.71        0.125          1.12           2.94
#>  2  6.85         1.28       2.93        0.601          1.11           2.88
#>  3  6.99         1.20       2.87        0.843          0.686          2.81
#>  4  6.36         1.20       2.77       -0.189          0.800          2.77
#>  5  6.98         1.26       2.89        0.594          0.984          2.81
#>  6  6.65         1.28       2.69        0.621          0.584          2.80
#>  7  6.44         1.20       2.70        0.539          0.628          2.75
#>  8  6.57         1.26       2.75        0.821          1.07           2.85
#>  9  7.18         1.28       3.02        0.491          0.964          2.93
#> 10  4.63         1.20       2.74       -0.0515         1.13           2.91
#> # ℹ 90 more rows
#> # ℹ 53 more variables: `empty|Pd106` <dbl>, `empty|Pd108` <dbl>,
#> #   `empty|Pd110` <dbl>, `CD61|In113` <dbl>, `CD99|In115` <dbl>,
#> #   `empty|I127` <dbl>, `CD45RA|La139` <dbl>, `CD93|Ce140` <dbl>,
#> #   `CD3_CD19|Pr141` <dbl>, `CCR2|Nd142` <dbl>, `CD117|Nd143` <dbl>,
#> #   `CD123|Nd144` <dbl>, `CD64|Nd145` <dbl>, `CD90|Nd146` <dbl>,
#> #   `CD38|Sm147` <dbl>, `CD34|Nd148` <dbl>, `CEBPa|Sm149` <dbl>, …