TY - JOUR
T1 - Robust and accurate estimation of cellular fraction from tissue omics data via ensemble deconvolution
AU - Cai, Manqi
AU - Yue, Molin
AU - Chen, Tianmeng
AU - Liu, Jinling
AU - Forno, Erick
AU - Lu, Xinghua
AU - Billiar, Timothy
AU - Celedón, Juan
AU - Mckennan, Chris
AU - Chen, Wei
AU - Wang, Jiebiao
N1 - Publisher Copyright:
© 2022 The Author(s). Published by Oxford University Press. All rights reserved.
PY - 2022/6/1
Y1 - 2022/6/1
N2 - Motivation: Tissue-level omics data such as transcriptomics and epigenomics are an average across diverse cell types. To extract cell-type-specific (CTS) signals, dozens of cellular deconvolution methods have been proposed to infer cell-type fractions from tissue-level data. However, these methods produce vastly different results under various real data settings. Simulation-based benchmarking studies showed no universally best deconvolution approaches. There have been attempts of ensemble methods, but they only aggregate multiple single-cell references or reference-free deconvolution methods. Results: To achieve a robust estimation of cellular fractions, we proposed EnsDeconv (Ensemble Deconvolution), which adopts CTS robust regression to synthesize the results from 11 single deconvolution methods, 10 reference datasets, 5 marker gene selection procedures, 5 data normalizations and 2 transformations. Unlike most benchmarking studies based on simulations, we compiled four large real datasets of 4937 tissue samples in total with measured cellular fractions and bulk gene expression from different tissues. Comprehensive evaluations demonstrated that EnsDeconv yields more stable, robust and accurate fractions than existing methods. We illustrated that EnsDeconv estimated cellular fractions enable various CTS downstream analyses such as differential fractions associated with clinical variables. We further extended EnsDeconv to analyze bulk DNA methylation data.
AB - Motivation: Tissue-level omics data such as transcriptomics and epigenomics are an average across diverse cell types. To extract cell-type-specific (CTS) signals, dozens of cellular deconvolution methods have been proposed to infer cell-type fractions from tissue-level data. However, these methods produce vastly different results under various real data settings. Simulation-based benchmarking studies showed no universally best deconvolution approaches. There have been attempts of ensemble methods, but they only aggregate multiple single-cell references or reference-free deconvolution methods. Results: To achieve a robust estimation of cellular fractions, we proposed EnsDeconv (Ensemble Deconvolution), which adopts CTS robust regression to synthesize the results from 11 single deconvolution methods, 10 reference datasets, 5 marker gene selection procedures, 5 data normalizations and 2 transformations. Unlike most benchmarking studies based on simulations, we compiled four large real datasets of 4937 tissue samples in total with measured cellular fractions and bulk gene expression from different tissues. Comprehensive evaluations demonstrated that EnsDeconv yields more stable, robust and accurate fractions than existing methods. We illustrated that EnsDeconv estimated cellular fractions enable various CTS downstream analyses such as differential fractions associated with clinical variables. We further extended EnsDeconv to analyze bulk DNA methylation data.
UR - http://www.scopus.com/inward/record.url?scp=85132845558&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btac279
DO - 10.1093/bioinformatics/btac279
M3 - Article
C2 - 35438146
AN - SCOPUS:85132845558
SN - 1367-4803
VL - 38
SP - 3004
EP - 3010
JO - Bioinformatics
JF - Bioinformatics
IS - 11
ER -