TY - JOUR
T1 - Using Machine Learning to Create Prognostic Systems for Primary Prostate Cancer
AU - Guan, Kevin
AU - Guan, Andy
AU - Ahmed, Anwar E.
AU - Waters, Andrew J.
AU - Tan, Shyh Han
AU - Chen, Dechang
N1 - Publisher Copyright:
© 2025 by the authors.
PY - 2025/10
Y1 - 2025/10
N2 - Background: Cancer staging, guided by anatomical and clinicopathologic factors, is essential for determining treatment strategies and patient prognosis. The current gold standard for prostate cancer is the American Joint Committee on Cancer (AJCC) Tumor, Lymph Node, and Metastasis (TNM) Staging System 9th Version (2024). This system incorporates five prognostic variables: tumor (T), spread to lymph nodes (N), metastasis (M), prostate-specific antigen (PSA) levels (P), and Grade Group/Gleason score (G). While effective, further refinement of prognostic systems may improve prediction of patient outcomes and support more individualized treatment. Methods: We applied the Ensemble Algorithm for Clustering Cancer Data (EACCD), an unsupervised machine learning approach. EACCD involves three steps: calculating initial dissimilarities, performing ensemble learning, and conducting hierarchical clustering. We first developed an EACCD model using the five AJCC variables (T, N, M, P, G). The model was then expanded to include two additional factors, age (A) and race (R). Prostate cancer patient data were obtained from the Surveillance, Epidemiology, and End Results (SEER) program from the National Cancer Institute. Results: The EACCD algorithm effectively stratified patients into distinct prognostic groups, each with well-separated survival curves. The five-variable model achieved a concordance index (C-index) of 0.8293 (95% CI: 0.8245–0.8341), while the seven-variable model, including age and race, improved performance to 0.8504 (95% CI: 0.8461–0.8547). Both outperformed the AJCC TNM system, which had a C-index of 0.7676 (95% CI: 0.7622–0.7731). Conclusions: EACCD provides a refined prognostic framework for primary localized prostate cancer, demonstrating superior accuracy over the AJCC staging system. With further validation in independent cohorts, EACCD could enhance risk stratification and support precision oncology.
AB - Background: Cancer staging, guided by anatomical and clinicopathologic factors, is essential for determining treatment strategies and patient prognosis. The current gold standard for prostate cancer is the American Joint Committee on Cancer (AJCC) Tumor, Lymph Node, and Metastasis (TNM) Staging System 9th Version (2024). This system incorporates five prognostic variables: tumor (T), spread to lymph nodes (N), metastasis (M), prostate-specific antigen (PSA) levels (P), and Grade Group/Gleason score (G). While effective, further refinement of prognostic systems may improve prediction of patient outcomes and support more individualized treatment. Methods: We applied the Ensemble Algorithm for Clustering Cancer Data (EACCD), an unsupervised machine learning approach. EACCD involves three steps: calculating initial dissimilarities, performing ensemble learning, and conducting hierarchical clustering. We first developed an EACCD model using the five AJCC variables (T, N, M, P, G). The model was then expanded to include two additional factors, age (A) and race (R). Prostate cancer patient data were obtained from the Surveillance, Epidemiology, and End Results (SEER) program from the National Cancer Institute. Results: The EACCD algorithm effectively stratified patients into distinct prognostic groups, each with well-separated survival curves. The five-variable model achieved a concordance index (C-index) of 0.8293 (95% CI: 0.8245–0.8341), while the seven-variable model, including age and race, improved performance to 0.8504 (95% CI: 0.8461–0.8547). Both outperformed the AJCC TNM system, which had a C-index of 0.7676 (95% CI: 0.7622–0.7731). Conclusions: EACCD provides a refined prognostic framework for primary localized prostate cancer, demonstrating superior accuracy over the AJCC staging system. With further validation in independent cohorts, EACCD could enhance risk stratification and support precision oncology.
KW - C-index
KW - EACCD
KW - cancer staging
KW - dendrogram
KW - machine learning
KW - prostate cancer
KW - survival curves
UR - http://www.scopus.com/inward/record.url?scp=105019203649&partnerID=8YFLogxK
U2 - 10.3390/diagnostics15192462
DO - 10.3390/diagnostics15192462
M3 - Article
AN - SCOPUS:105019203649
SN - 2075-4418
VL - 15
JO - Diagnostics
JF - Diagnostics
IS - 19
M1 - 2462
ER -