Using Machine Learning to Expand the Ann Arbor Staging System for Hodgkin and Non-Hodgkin Lymphoma

Huan Wang, Zhenqiu Liu, Julie Yang, Li Sheng, Dechang Chen*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review


The Ann Arbor system is disadvantaged in utilizing information from additional prognostic factors. In this study, we applied the Ensemble Algorithm for Clustering Cancer Data (EACCD) to create a prognostic system for lymphoma that integrates additional prognostic factors. Hodgkin and non-Hodgkin lymphoma survival data were extracted from the Surveillance, Epidemiology, and End Results Program of the National Cancer Institute and divided into the training set (131,725 cases) and the validation set (15,683 cases). Five prognostic factors were studied: Ann Arbor stage, type, site, age, and sex. EACCD was applied to the training set to produce a prognostic system, called an EACCD system, for convenience. The EACCD system stratified patients into eight prognostic groups with well-separated survival curves. These eight prognostic groups had significantly higher accuracies in survival prediction than the 24 Ann Arbor substages. A higher-risk group in the EACCD system roughly corresponds to a higher Ann Arbor substage. The proposed system shows a good performance in risk stratification and survival prediction on both the training and the validation sets. The EACCD system expands the traditional Ann Arbor staging system by leveraging additional prognostic information and is expected to advance treatment management for lymphoma patients.

Original languageEnglish
Pages (from-to)514-525
Number of pages12
Issue number3
StatePublished - Sep 2023


  • C-index
  • cancer staging
  • dendrogram
  • lymphoma
  • machine learning


Dive into the research topics of 'Using Machine Learning to Expand the Ann Arbor Staging System for Hodgkin and Non-Hodgkin Lymphoma'. Together they form a unique fingerprint.

Cite this