Expanding the TNM for cancers of the colon and rectum using machine learning: A demonstration

Matthew Hueman, Huan Wang, Donald Henson, Dechang Chen*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

14 Scopus citations


Objective The American Joint Committee on Cancer (AJCC) system for staging cancers of the colon and rectum includes depth of tumour penetration, number of positive lymph nodes and presence or absence of metastasis. Using machine learning, we demonstrate that these factors can be integrated with age, carcinoembryonic antigen (CEA) interpretation and tumour location, to form prognostic systems that expand the tumour, lymph node, metastasis (TNM) staging system. Methods Two datasets on colon and rectal cancers were extracted from the Surveillance, Epidemiology and End Results Programme of the National Cancer Institute. Dataset 1 included three factors (tumour, lymph nodes and metastasis). Dataset 2 contained six factors (tumour, lymph nodes, metastasis, age, CEA interpretation and tumour location). The Ensemble Algorithm for Clustering Cancer Data (EACCD) and the C-index were applied to generate prognostic groups. Results The EACCD prognostic system based on dataset 1 stratified patients into 10 risk groups, analogous to the 10 stages of the AJCC staging system. There was a strong inter-system association between EACCD grouping and AJCC staging (Spearman's rank correlation=0.9046, p value=1.6×10 -17). However, the EACCD system had a significantly higher survival prediction accuracy than the AJCC system (C-index=0.7802 and 0.7695, respectively for the EACCD system and AJCC system, p value=4.9×10 -91). Adding age, or CEA interpretation, or location improved the prediction accuracy of the prognostic system-involving tumour, lymph nodes and metastasis. The EACCD prognostic system based on dataset 2 and all six factors stratified patients into 10 groups with the highest survival prediction accuracy (C-index=0.7914). Conclusions The EACCD can integrate multiple factors to stratify patients with colon or rectal cancer into risk groups that predict survival with a high accuracy.

Original languageEnglish
Article numbere000518
JournalESMO Open
Issue number3
StatePublished - 1 Jun 2019
Externally publishedYes


  • C-index
  • colorectal cancer
  • dendrogram
  • machine learning
  • staging


Dive into the research topics of 'Expanding the TNM for cancers of the colon and rectum using machine learning: A demonstration'. Together they form a unique fingerprint.

Cite this