An evaluation of image-based and statistical techniques for harmonizing brain volume measurements

Yuan Chiao Lu*, Lianrui Zuo, Yi Yu Chou, Blake E. Dewey, Samuel Remedios, Russell T. Shinohara, Sonya U. Steele, Govind Nair, Daniel S. Reich, Jerry L. Prince, Dzung L. Pham

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Volumetric analysis of magnetic resonance brain images is often complicated by variations in scanner hardware, software, and acquisition settings. Over the past several years, there has been an increase in the use of retrospective harmonization techniques for addressing these variations. This research evaluates three image harmonization methods—neuroCombat (a statistical batch correction tool), DeepHarmony (a supervised deep learning method based on image-to-image translation), and HACA3 (an unsupervised deep learning image translation approach). The study focuses on their effectiveness in achieving consistent brain volume measurements across differing T1-weighted acquisitions (GRE and MPRAGE) and their ability to detect simulated atrophy changes in the same acquisitions. While all three methods notably enhance the consistency of regional brain volumes compared with unharmonized images, HACA3 demonstrated the lowest measurement variation in terms of absolute volume difference percentage (AVDP) across all brain regions (<3%). It also demonstrated the highest agreement between the coefficient of variation (CV) measurements of GRE and MPRAGE images, evidenced by the smallest mean difference (0.12) and the narrowest 95% confidence intervals ([-1.04, 1.28]), alongside achieving the highest intra-class correlation (ICC) values across all regions (ICC >0.9). In the atrophy simulation experiments, HACA3 consistently achieved the smallest AVDPs across most unchanged brain regions, while DeepHarmony showed significant improvements in several regions, and neuroCombat exhibited higher variability. Additionally, using neuroCombat with training data effectively detected hippocampal atrophy, whereas without training, neuroCombat could not differentiate between images with and without atrophy, highlighting a potential limitation in its ability to detect subtle brain volume changes when training data are unavailable. In most metrics, HACA3 was found to be the most effective for harmonizing MRI data, followed by DeepHarmony, with neuroCombat showing more measurement variability but still offering improvements over unharmonized data.

Original languageEnglish
Article numberIMAG.a.73
JournalImaging Neuroscience
Volume3
DOIs
StatePublished - 14 Jul 2025

Keywords

  • ComBat
  • brain volumes
  • deep learning
  • image harmonization
  • magnetic resonance imaging
  • segmentation

Cite this