{tidytof}: Predicting Patient Outcomes from Single-cell Data using Tidy Data Principles
Date:
This talk about {tidytof}, an R package for analyzing single-cell data using a tidy interface, was originally given at R/Medicine 2021 and is available here.
Abstract: In recent years, the concept of “tidiness”—a data analysis framework in which data frames are structured such that each variable is a column and each observation is a row—has become a staple of the data science community. This is largely because working with “tidy data” makes it easy to map the meaning of a dataset to a specific and consistent structure, which in turn allows for the development of easily-used data analysis tools for a variety of applications. Despite these strengths, relatively few tools have been developed for analyzing clinical and biomedical data using tidy data principles.
Here, we present {tidytof} – a novel R package for tidy analysis of mass cytometry (CyTOF) single-cell data – as a case study for using tidy data principles to simplify the manipulation, visualization, and modeling of biomedical data. Specifically, {tidytof} leverages tidy data analysis to implement a reproducible and human-readable pipeline for common single-cell analysis tasks including reading/writing data, clustering, dimensionality reduction, feature engineering, and patient-outcomes modeling. Our presentation will highlight the process behind and benefits of using tidy principles to build simple and intuitive biomedical data analysis workflows.