A machine learning prediction model for total shoulder arthroplasty procedure duration: an evaluation of surgeon, patient, and shoulder-specific factors

Jay M. Levin*, Hamed Zaribafzadeh, Tom R. Doyle, Kwabena Adu-Kwarteng, Kiera Lunn, Joshua K. Helmkamp, Wendy Webster, Eoghan T. Hurley, Jonathan F. Dickens, Alison Toth, Oke Anakwenze, Christopher S. Klifto

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Background: Operating room efficiency is of paramount importance for scheduling, cost efficiency, and to allow for the high operating volume required to address the growing demand for arthroplasty. The purpose of this study was to develop a machine learning predictive model for total shoulder arthroplasty (TSA) procedure duration and to identify factors which are predictive of a prolonged procedure. Methods: A retrospective review was undertaken of all TSA between 2013 and 2021 in a large academic institution. Patient, surgeon, anesthetic, and shoulder-specific factors were assessed. The duration of time in the operating room was recorded and compared to the human scheduler and electronic health record predicted procedure duration. Two gradient-boosted decision tree regression models were created with both training and validation datasets. The mean squared logarithmic error was chosen as the loss function. The first model (M1) considered patient, surgeon, and anesthetic factors, while the second model (M2) considered shoulder anatomy and pathology specific factors in addition. Results: Human schedulers’ predicted 64.1% of cases accurately, with 26.7% underpredicted and 9.2% overpredicted. M1 successfully predicted 79.7% of cases, with 6.9% underpredicted and 13.4% overpredicted. M2 successfully predicted 82.5% of cases with 8.8% underpredicted and 8.8% overpredicted. M2 was significantly more accurate in predicting anatomic total shoulder arthroplasty compared to reverse (rTSA) (90.6% vs. 78.1%, P < .001).The feature with the greatest impact on the shoulder-specific model's prediction was the historical median procedure duration; followed by the electronic health record prediction, surgeon prediction, patient age, and a traumatic indication. Factors which were associated with underpredicting procedure duration included younger age, traumatic indication, male sex, greater body mass index, and a B2 glenoid. Conclusion: Machine learning predictive models outperformed traditional scheduling, with a model incorporating general and shoulder-specific data providing the most accurate prediction of TSA procedure duration. Integration of modeling has the potential to optimize theater utilization and improve efficiency.

Original languageEnglish
JournalJournal of Shoulder and Elbow Surgery
DOIs
StateAccepted/In press - 2025
Externally publishedYes

Keywords

  • Basic Science Study
  • Computer Modeling Using AI/Machine Learning
  • Shoulder arthroplasty
  • artificial intelligence
  • clinical effectiveness
  • health economics
  • machine learning
  • reverse shoulder arthroplasty

Cite this