This function transforms a `tof_tbl` of raw ion counts, reads, or fluorescence intensity units directly measured on a cytometer using a user-provided function. It can be used to perform standard pre-processing steps (i.e. arcsinh transformation) before cytometry data analysis.
Arguments
- tof_tibble
A `tof_tbl` or a `tibble`.
- channel_cols
Unquoted column names representing columns that contain single-cell protein measurements. Supports tidyselect helpers. If nothing is specified, the default is to transform all numeric columns.
- undo_noise
A boolean value indicating whether to remove the uniform noise that Fluidigm software adds to CyTOF measurements for aesthetic and visualization purposes. See this paper. Defaults to FALSE.
- transform_fun
A vectorized function to apply to each protein value for variance stabilization. Defaults to
asinh
transformation (with a co-factor of 5).
Value
A `tof_tbl` with identical dimensions to the input `tof_tibble`, with all columns specified in channel_cols transformed using `transform_fun` (with noise removed or not removed depending on `undo_noise`).
Examples
# read in an example .fcs file from tidytof's internal datasets
input_file <- dir(tidytof_example_data("aml"), full.names = TRUE)[[1]]
tof_tibble <- tof_read_data(input_file)
# preprocess all numeric columns with default behavior
# arcsinh transformation with a cofactor of 5
tof_preprocess(tof_tibble)
#> # A tibble: 100 × 59
#> Time Event_length `CD45|Y89` `empty|Pd102` `empty|Pd104` `empty|Pd105`
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 15.3 1.88 5.33 0.263 1.70 5.85
#> 2 14.9 2.05 5.83 0.731 1.67 5.71
#> 3 15.2 1.88 5.70 1.13 0.861 5.54
#> 4 13.7 1.88 5.45 0.129 1.06 5.46
#> 5 15.2 1.99 5.73 0.721 1.41 5.55
#> 6 14.4 2.05 5.27 0.760 0.708 5.52
#> 7 13.9 1.88 5.31 0.645 0.771 5.42
#> 8 14.2 1.99 5.42 1.09 1.58 5.64
#> 9 15.6 2.05 6.03 0.586 1.37 5.83
#> 10 9.75 1.88 5.38 0.177 1.73 5.78
#> # ℹ 90 more rows
#> # ℹ 53 more variables: `empty|Pd106` <dbl>, `empty|Pd108` <dbl>,
#> # `empty|Pd110` <dbl>, `CD61|In113` <dbl>, `CD99|In115` <dbl>,
#> # `empty|I127` <dbl>, `CD45RA|La139` <dbl>, `CD93|Ce140` <dbl>,
#> # `CD3_CD19|Pr141` <dbl>, `CCR2|Nd142` <dbl>, `CD117|Nd143` <dbl>,
#> # `CD123|Nd144` <dbl>, `CD64|Nd145` <dbl>, `CD90|Nd146` <dbl>,
#> # `CD38|Sm147` <dbl>, `CD34|Nd148` <dbl>, `CEBPa|Sm149` <dbl>, …
# preprocess all numeric columns using the log base 10 tranformation
tof_preprocess(tof_tibble, transform_fun = log10)
#> # A tibble: 100 × 59
#> Time Event_length `CD45|Y89` `empty|Pd102` `empty|Pd104` `empty|Pd105`
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 7.04 1.20 2.71 0.125 1.12 2.94
#> 2 6.85 1.28 2.93 0.601 1.11 2.88
#> 3 6.99 1.20 2.87 0.843 0.686 2.81
#> 4 6.36 1.20 2.77 -0.189 0.800 2.77
#> 5 6.98 1.26 2.89 0.594 0.984 2.81
#> 6 6.65 1.28 2.69 0.621 0.584 2.80
#> 7 6.44 1.20 2.70 0.539 0.628 2.75
#> 8 6.57 1.26 2.75 0.821 1.07 2.85
#> 9 7.18 1.28 3.02 0.491 0.964 2.93
#> 10 4.63 1.20 2.74 -0.0515 1.13 2.91
#> # ℹ 90 more rows
#> # ℹ 53 more variables: `empty|Pd106` <dbl>, `empty|Pd108` <dbl>,
#> # `empty|Pd110` <dbl>, `CD61|In113` <dbl>, `CD99|In115` <dbl>,
#> # `empty|I127` <dbl>, `CD45RA|La139` <dbl>, `CD93|Ce140` <dbl>,
#> # `CD3_CD19|Pr141` <dbl>, `CCR2|Nd142` <dbl>, `CD117|Nd143` <dbl>,
#> # `CD123|Nd144` <dbl>, `CD64|Nd145` <dbl>, `CD90|Nd146` <dbl>,
#> # `CD38|Sm147` <dbl>, `CD34|Nd148` <dbl>, `CEBPa|Sm149` <dbl>, …