TY - JOUR
T1 - An evaluation of image-based and statistical techniques for harmonizing brain volume measurements
AU - Lu, Yuan Chiao
AU - Zuo, Lianrui
AU - Chou, Yi Yu
AU - Dewey, Blake E.
AU - Remedios, Samuel
AU - Shinohara, Russell T.
AU - Steele, Sonya U.
AU - Nair, Govind
AU - Reich, Daniel S.
AU - Prince, Jerry L.
AU - Pham, Dzung L.
N1 - Publisher Copyright:
© 2025 The Authors. Published under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
PY - 2025/7/14
Y1 - 2025/7/14
N2 - Volumetric analysis of magnetic resonance brain images is often complicated by variations in scanner hardware, software, and acquisition settings. Over the past several years, there has been an increase in the use of retrospective harmonization techniques for addressing these variations. This research evaluates three image harmonization methods—neuroCombat (a statistical batch correction tool), DeepHarmony (a supervised deep learning method based on image-to-image translation), and HACA3 (an unsupervised deep learning image translation approach). The study focuses on their effectiveness in achieving consistent brain volume measurements across differing T1-weighted acquisitions (GRE and MPRAGE) and their ability to detect simulated atrophy changes in the same acquisitions. While all three methods notably enhance the consistency of regional brain volumes compared with unharmonized images, HACA3 demonstrated the lowest measurement variation in terms of absolute volume difference percentage (AVDP) across all brain regions (<3%). It also demonstrated the highest agreement between the coefficient of variation (CV) measurements of GRE and MPRAGE images, evidenced by the smallest mean difference (0.12) and the narrowest 95% confidence intervals ([-1.04, 1.28]), alongside achieving the highest intra-class correlation (ICC) values across all regions (ICC >0.9). In the atrophy simulation experiments, HACA3 consistently achieved the smallest AVDPs across most unchanged brain regions, while DeepHarmony showed significant improvements in several regions, and neuroCombat exhibited higher variability. Additionally, using neuroCombat with training data effectively detected hippocampal atrophy, whereas without training, neuroCombat could not differentiate between images with and without atrophy, highlighting a potential limitation in its ability to detect subtle brain volume changes when training data are unavailable. In most metrics, HACA3 was found to be the most effective for harmonizing MRI data, followed by DeepHarmony, with neuroCombat showing more measurement variability but still offering improvements over unharmonized data.
AB - Volumetric analysis of magnetic resonance brain images is often complicated by variations in scanner hardware, software, and acquisition settings. Over the past several years, there has been an increase in the use of retrospective harmonization techniques for addressing these variations. This research evaluates three image harmonization methods—neuroCombat (a statistical batch correction tool), DeepHarmony (a supervised deep learning method based on image-to-image translation), and HACA3 (an unsupervised deep learning image translation approach). The study focuses on their effectiveness in achieving consistent brain volume measurements across differing T1-weighted acquisitions (GRE and MPRAGE) and their ability to detect simulated atrophy changes in the same acquisitions. While all three methods notably enhance the consistency of regional brain volumes compared with unharmonized images, HACA3 demonstrated the lowest measurement variation in terms of absolute volume difference percentage (AVDP) across all brain regions (<3%). It also demonstrated the highest agreement between the coefficient of variation (CV) measurements of GRE and MPRAGE images, evidenced by the smallest mean difference (0.12) and the narrowest 95% confidence intervals ([-1.04, 1.28]), alongside achieving the highest intra-class correlation (ICC) values across all regions (ICC >0.9). In the atrophy simulation experiments, HACA3 consistently achieved the smallest AVDPs across most unchanged brain regions, while DeepHarmony showed significant improvements in several regions, and neuroCombat exhibited higher variability. Additionally, using neuroCombat with training data effectively detected hippocampal atrophy, whereas without training, neuroCombat could not differentiate between images with and without atrophy, highlighting a potential limitation in its ability to detect subtle brain volume changes when training data are unavailable. In most metrics, HACA3 was found to be the most effective for harmonizing MRI data, followed by DeepHarmony, with neuroCombat showing more measurement variability but still offering improvements over unharmonized data.
KW - ComBat
KW - brain volumes
KW - deep learning
KW - image harmonization
KW - magnetic resonance imaging
KW - segmentation
UR - http://www.scopus.com/inward/record.url?scp=105011642109&partnerID=8YFLogxK
U2 - 10.1162/IMAG.a.73
DO - 10.1162/IMAG.a.73
M3 - Article
AN - SCOPUS:105011642109
SN - 2837-6056
VL - 3
JO - Imaging Neuroscience
JF - Imaging Neuroscience
M1 - IMAG.a.73
ER -