Using Machine Learning to Create Prognostic Systems for Primary Prostate Cancer

Kevin Guan, Andy Guan, Anwar E. Ahmed, Andrew J. Waters, Shyh Han Tan*, Dechang Chen*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Background: Cancer staging, guided by anatomical and clinicopathologic factors, is essential for determining treatment strategies and patient prognosis. The current gold standard for prostate cancer is the American Joint Committee on Cancer (AJCC) Tumor, Lymph Node, and Metastasis (TNM) Staging System 9th Version (2024). This system incorporates five prognostic variables: tumor (T), spread to lymph nodes (N), metastasis (M), prostate-specific antigen (PSA) levels (P), and Grade Group/Gleason score (G). While effective, further refinement of prognostic systems may improve prediction of patient outcomes and support more individualized treatment. Methods: We applied the Ensemble Algorithm for Clustering Cancer Data (EACCD), an unsupervised machine learning approach. EACCD involves three steps: calculating initial dissimilarities, performing ensemble learning, and conducting hierarchical clustering. We first developed an EACCD model using the five AJCC variables (T, N, M, P, G). The model was then expanded to include two additional factors, age (A) and race (R). Prostate cancer patient data were obtained from the Surveillance, Epidemiology, and End Results (SEER) program from the National Cancer Institute. Results: The EACCD algorithm effectively stratified patients into distinct prognostic groups, each with well-separated survival curves. The five-variable model achieved a concordance index (C-index) of 0.8293 (95% CI: 0.8245–0.8341), while the seven-variable model, including age and race, improved performance to 0.8504 (95% CI: 0.8461–0.8547). Both outperformed the AJCC TNM system, which had a C-index of 0.7676 (95% CI: 0.7622–0.7731). Conclusions: EACCD provides a refined prognostic framework for primary localized prostate cancer, demonstrating superior accuracy over the AJCC staging system. With further validation in independent cohorts, EACCD could enhance risk stratification and support precision oncology.

Original languageEnglish
Article number2462
JournalDiagnostics
Volume15
Issue number19
DOIs
StatePublished - Oct 2025

Keywords

  • C-index
  • EACCD
  • cancer staging
  • dendrogram
  • machine learning
  • prostate cancer
  • survival curves

Cite this