TY - JOUR
T1 - A selective CutMix approach improves generalizability of deep learning-based grading and risk assessment of prostate cancer
AU - Patkar, Sushant
AU - Harmon, Stephanie
AU - Sesterhenn, Isabell
AU - Lis, Rosina
AU - Merino, Maria
AU - Young, Denise
AU - Brown, G. Thomas
AU - Greenfield, Kimberly M.
AU - McGeeney, John D.
AU - Elsamanoudi, Sally
AU - Tan, Shyh Han
AU - Schafer, Cara
AU - Jiang, Jiji
AU - Petrovics, Gyorgy
AU - Dobi, Albert
AU - Rentas, Francisco J.
AU - Pinto, Peter A.
AU - Chesnut, Gregory T.
AU - Choyke, Peter
AU - Turkbey, Baris
AU - Moncur, Joel T.
N1 - Publisher Copyright:
© 2024
PY - 2024/12
Y1 - 2024/12
N2 - The Gleason score is an important predictor of prognosis in prostate cancer. However, its subjective nature can result in over- or under-grading. Our objective was to train an artificial intelligence (AI)-based algorithm to grade prostate cancer in specimens from patients who underwent radical prostatectomy (RP) and to assess the correlation of AI-estimated proportions of different Gleason patterns with biochemical recurrence-free survival (RFS), metastasis-free survival (MFS), and overall survival (OS). Training and validation of algorithms for cancer detection and grading were completed with three large datasets containing a total of 580 whole-mount prostate slides from 191 RP patients at two centers and 6218 annotated needle biopsy slides from the publicly available Prostate Cancer Grading Assessment dataset. A cancer detection model was trained using MobileNetV3 on 0.5 mm × 0.5 mm cancer areas (tiles) captured at 10× magnification. For cancer grading, a Gleason pattern detector was trained on tiles using a ResNet50 convolutional neural network and a selective CutMix training strategy involving a mixture of real and artificial examples. This strategy resulted in improved model generalizability in the test set compared with three different control experiments when evaluated on both needle biopsy slides and whole-mount prostate slides from different centers. In an additional test cohort of RP patients who were clinically followed over 30 years, quantitative Gleason pattern AI estimates achieved concordance indexes of 0.69, 0.72, and 0.64 for predicting RFS, MFS, and OS times, outperforming the control experiments and International Society of Urological Pathology system (ISUP) grading by pathologists. Finally, unsupervised clustering of test RP patient specimens into low-, medium-, and high-risk groups based on AI-estimated proportions of each Gleason pattern resulted in significantly improved RFS and MFS stratification compared with ISUP grading. In summary, deep learning-based quantitative Gleason scoring using a selective CutMix training strategy may improve prognostication after prostate cancer surgery.
AB - The Gleason score is an important predictor of prognosis in prostate cancer. However, its subjective nature can result in over- or under-grading. Our objective was to train an artificial intelligence (AI)-based algorithm to grade prostate cancer in specimens from patients who underwent radical prostatectomy (RP) and to assess the correlation of AI-estimated proportions of different Gleason patterns with biochemical recurrence-free survival (RFS), metastasis-free survival (MFS), and overall survival (OS). Training and validation of algorithms for cancer detection and grading were completed with three large datasets containing a total of 580 whole-mount prostate slides from 191 RP patients at two centers and 6218 annotated needle biopsy slides from the publicly available Prostate Cancer Grading Assessment dataset. A cancer detection model was trained using MobileNetV3 on 0.5 mm × 0.5 mm cancer areas (tiles) captured at 10× magnification. For cancer grading, a Gleason pattern detector was trained on tiles using a ResNet50 convolutional neural network and a selective CutMix training strategy involving a mixture of real and artificial examples. This strategy resulted in improved model generalizability in the test set compared with three different control experiments when evaluated on both needle biopsy slides and whole-mount prostate slides from different centers. In an additional test cohort of RP patients who were clinically followed over 30 years, quantitative Gleason pattern AI estimates achieved concordance indexes of 0.69, 0.72, and 0.64 for predicting RFS, MFS, and OS times, outperforming the control experiments and International Society of Urological Pathology system (ISUP) grading by pathologists. Finally, unsupervised clustering of test RP patient specimens into low-, medium-, and high-risk groups based on AI-estimated proportions of each Gleason pattern resulted in significantly improved RFS and MFS stratification compared with ISUP grading. In summary, deep learning-based quantitative Gleason scoring using a selective CutMix training strategy may improve prognostication after prostate cancer surgery.
KW - Artificial intelligence
KW - Digital pathology
KW - Gleason grading
KW - Prostate cancer
UR - http://www.scopus.com/inward/record.url?scp=85195088900&partnerID=8YFLogxK
U2 - 10.1016/j.jpi.2024.100381
DO - 10.1016/j.jpi.2024.100381
M3 - Article
AN - SCOPUS:85195088900
SN - 2229-5089
VL - 15
JO - Journal of Pathology Informatics
JF - Journal of Pathology Informatics
M1 - 100381
ER -