TY - JOUR
T1 - Identification of germline population variants misclassified as cancer-associated somatic variants
AU - Pollard, Rebecca D.
AU - Wilkerson, Matthew D.
AU - Rajagopal, Padma Sheila
N1 - Publisher Copyright:
Copyright © 2024 Pollard, Wilkerson and Rajagopal.
PY - 2024
Y1 - 2024
N2 - Introduction: Databases used for clinical interpretation in oncology rely on genetic data derived primarily from patients of European ancestry, leading to biases in cancer genetics research and clinical practice. One practical issue that arises in this context is the potential misclassification of multi-ancestral population variants as tumor-associated because they are not represented in reference genomes against which tumor sequencing data is aligned. Methods: To systematically find misclassified variants, we compared somatic variants in census genes from the Catalogue of Somatic Mutations in Cancer (COSMIC) V99 with multi-ancestral population variants from the Genome Aggregation Databases’ Linkage Disequilibrium (GnomAD). By comparing genomic coordinates, reference, and alternate alleles, we could identify misclassified variants in genes associated with cancer. Results: We found 192 of 208 genes in COSMIC’s cancer-associated census genes (92.31%) to be associated with variant misclassifications. Among the 1,906,732 variants in COSMIC, 6,957 variants (0.36%) aligned with normal population variants in GnomAD, concerning for misclassification. The African / African American ancestral population included the greatest number of misclassified variants and also had the greatest number of unique misclassified variants. Conclusion: The direct, systematic comparison of variants from COSMIC for co-occurrence in GnomAD supports a more accurate interpretation of tumor sequencing data and reduces bias related to genomic ancestry.
AB - Introduction: Databases used for clinical interpretation in oncology rely on genetic data derived primarily from patients of European ancestry, leading to biases in cancer genetics research and clinical practice. One practical issue that arises in this context is the potential misclassification of multi-ancestral population variants as tumor-associated because they are not represented in reference genomes against which tumor sequencing data is aligned. Methods: To systematically find misclassified variants, we compared somatic variants in census genes from the Catalogue of Somatic Mutations in Cancer (COSMIC) V99 with multi-ancestral population variants from the Genome Aggregation Databases’ Linkage Disequilibrium (GnomAD). By comparing genomic coordinates, reference, and alternate alleles, we could identify misclassified variants in genes associated with cancer. Results: We found 192 of 208 genes in COSMIC’s cancer-associated census genes (92.31%) to be associated with variant misclassifications. Among the 1,906,732 variants in COSMIC, 6,957 variants (0.36%) aligned with normal population variants in GnomAD, concerning for misclassification. The African / African American ancestral population included the greatest number of misclassified variants and also had the greatest number of unique misclassified variants. Conclusion: The direct, systematic comparison of variants from COSMIC for co-occurrence in GnomAD supports a more accurate interpretation of tumor sequencing data and reduces bias related to genomic ancestry.
KW - germline
KW - health disparities
KW - misclassification
KW - somatic
KW - variant classification
UR - http://www.scopus.com/inward/record.url?scp=85189642881&partnerID=8YFLogxK
U2 - 10.3389/fmed.2024.1361317
DO - 10.3389/fmed.2024.1361317
M3 - Article
AN - SCOPUS:85189642881
SN - 2296-858X
VL - 11
JO - Frontiers in Medicine
JF - Frontiers in Medicine
M1 - 1361317
ER -