Real-World Pitfalls of Analyzing Real-World Data: A Cautionary Note and Path Forward

John D. Cooper, Karen Shou, Kevin Sunderland, Kevin Pham, Jennifer A. Thornton, Christin B. DeStefano

Research output: Contribution to journalArticlepeer-review

1 Scopus citations


PURPOSE: Real-world data (RWD) are pervasive in oncology research and offer insights into clinical trends and patient outcomes. However, RWD have shortcomings, making them prone to pitfalls during survival analyses. The American Society of Clinical Oncology (ASCO) CancerLinQ Discovery (CLQD) multiple myeloma (MM) data set was used to demonstrate some common pitfalls when analyzing survival from RWD: using incorrect surrogate markers for missing data and/or classification errors, ignoring deaths at time zero, and failing to account for guarantee-time bias. METHODS: The ASCO CLQD MM data set (July 19, 2021, release) was used to compare overall survival (OS) in patients with a known versus presumed date of MM diagnosis, in patients with secondary AML (sAML) with early deaths (ie, 0 months) included versus dropped, and in patients with second primary malignancies (SPMs) matched versus unmatched to control for time-related confounding factors (ie, guarantee-time bias). Analyses were conducted using STATA Version 17.0 (College Station, TX). RESULTS: In the CLQD MM data set, 28% of patients were missing a diagnosis date. Attempts to use the presumed diagnosis date (ie, first bortezomib or lenalidomide administration) as a surrogate marker for missing diagnosis dates were not successful as median OS was significantly different in patients with a recorded versus presumed diagnosis date (107 v 40 months, hazard ratio [HR], 2.5; 95% CI, 2.39 to 2.64; P < .001). Dropping deaths within 1 month of sAML diagnosis resulted in an exaggerated median OS (46 v 39 months). OS in patients with MM with SPMs differed substantially before and after incorporation of matching methods to account for guarantee-time bias (HR, 0.73; 95% CI, 0.67 to 0.78; P < .001 before matching, HR, 1.30; 95% CI, 1.18 to 1.43; P < .001 after matching). CONCLUSION: To fully maximize the benefits of RWD in oncology research, clinicians must be aware of analytic methods that can overcome pitfalls in survival analyses.

Original languageEnglish
Pages (from-to)e2300097
JournalJCO Clinical Cancer Informatics
StatePublished - 1 Sep 2023


Dive into the research topics of 'Real-World Pitfalls of Analyzing Real-World Data: A Cautionary Note and Path Forward'. Together they form a unique fingerprint.

Cite this