3:30 PM
4:30 PM
Abstract
We examine the consistency and overfitting of existing unsupervised multi-omics, or multi-modal, methods on high-throughput bulk tissue assays, e.g. RNA-seq, methylation, proteomics, where covariation across samples is considered. We propose a cross-validation framework to determine if the projections identified by unsupervised methods, which should maximize shared variation across data modalities, in fact generalize to out-of-fold samples. We further discuss the application of the cross-validation framework to multi-omics single cell datasets, using newly proposed methods designed specifically for single cell assays. Finally, we consider the use of the multi-modal framework for data-driven identification of low-quality samples in large omics cohorts.
Link to personal website: https://mikelove.github.io