TY - JOUR
T1 - Expanding the TNM for cancers of the colon and rectum using machine learning
T2 - A demonstration
AU - Hueman, Matthew
AU - Wang, Huan
AU - Henson, Donald
AU - Chen, Dechang
N1 - Publisher Copyright:
© Author (s) (or their employer(s)) 2019. Re-use permitted under CC BY-NC. No commercial re-use. Published by BMJ on behalf of the European Society for Medical Oncology.
PY - 2019/6/1
Y1 - 2019/6/1
N2 - Objective The American Joint Committee on Cancer (AJCC) system for staging cancers of the colon and rectum includes depth of tumour penetration, number of positive lymph nodes and presence or absence of metastasis. Using machine learning, we demonstrate that these factors can be integrated with age, carcinoembryonic antigen (CEA) interpretation and tumour location, to form prognostic systems that expand the tumour, lymph node, metastasis (TNM) staging system. Methods Two datasets on colon and rectal cancers were extracted from the Surveillance, Epidemiology and End Results Programme of the National Cancer Institute. Dataset 1 included three factors (tumour, lymph nodes and metastasis). Dataset 2 contained six factors (tumour, lymph nodes, metastasis, age, CEA interpretation and tumour location). The Ensemble Algorithm for Clustering Cancer Data (EACCD) and the C-index were applied to generate prognostic groups. Results The EACCD prognostic system based on dataset 1 stratified patients into 10 risk groups, analogous to the 10 stages of the AJCC staging system. There was a strong inter-system association between EACCD grouping and AJCC staging (Spearman's rank correlation=0.9046, p value=1.6×10 -17). However, the EACCD system had a significantly higher survival prediction accuracy than the AJCC system (C-index=0.7802 and 0.7695, respectively for the EACCD system and AJCC system, p value=4.9×10 -91). Adding age, or CEA interpretation, or location improved the prediction accuracy of the prognostic system-involving tumour, lymph nodes and metastasis. The EACCD prognostic system based on dataset 2 and all six factors stratified patients into 10 groups with the highest survival prediction accuracy (C-index=0.7914). Conclusions The EACCD can integrate multiple factors to stratify patients with colon or rectal cancer into risk groups that predict survival with a high accuracy.
AB - Objective The American Joint Committee on Cancer (AJCC) system for staging cancers of the colon and rectum includes depth of tumour penetration, number of positive lymph nodes and presence or absence of metastasis. Using machine learning, we demonstrate that these factors can be integrated with age, carcinoembryonic antigen (CEA) interpretation and tumour location, to form prognostic systems that expand the tumour, lymph node, metastasis (TNM) staging system. Methods Two datasets on colon and rectal cancers were extracted from the Surveillance, Epidemiology and End Results Programme of the National Cancer Institute. Dataset 1 included three factors (tumour, lymph nodes and metastasis). Dataset 2 contained six factors (tumour, lymph nodes, metastasis, age, CEA interpretation and tumour location). The Ensemble Algorithm for Clustering Cancer Data (EACCD) and the C-index were applied to generate prognostic groups. Results The EACCD prognostic system based on dataset 1 stratified patients into 10 risk groups, analogous to the 10 stages of the AJCC staging system. There was a strong inter-system association between EACCD grouping and AJCC staging (Spearman's rank correlation=0.9046, p value=1.6×10 -17). However, the EACCD system had a significantly higher survival prediction accuracy than the AJCC system (C-index=0.7802 and 0.7695, respectively for the EACCD system and AJCC system, p value=4.9×10 -91). Adding age, or CEA interpretation, or location improved the prediction accuracy of the prognostic system-involving tumour, lymph nodes and metastasis. The EACCD prognostic system based on dataset 2 and all six factors stratified patients into 10 groups with the highest survival prediction accuracy (C-index=0.7914). Conclusions The EACCD can integrate multiple factors to stratify patients with colon or rectal cancer into risk groups that predict survival with a high accuracy.
KW - C-index
KW - colorectal cancer
KW - dendrogram
KW - machine learning
KW - staging
UR - http://www.scopus.com/inward/record.url?scp=85067289338&partnerID=8YFLogxK
U2 - 10.1136/esmoopen-2019-000518
DO - 10.1136/esmoopen-2019-000518
M3 - Article
AN - SCOPUS:85067289338
SN - 2059-7029
VL - 4
JO - ESMO Open
JF - ESMO Open
IS - 3
M1 - e000518
ER -