Abstract
Detection thresholds for spoken sentences in steady-state noise are reduced by 1-3 dB when synchronized video images of movements of the lips and other surface features of the face are provided. An earlier study [K. W. Grant and P. F. Seitz, J. Acoust. Soc. Am. 108, 1197-1208 (2000)], showed that the amount of masked threshold reduction, or bimodal coherence masking protection (BCMP), was related to the degree of correlation between the rms amplitude envelope of the target sentence and the area of lip opening, especially in the mid-to-high frequencies typically associated with the second (F2) and third (F3) speech formants. In the present study, these results are extended by manipulating the cross-modality correlation through bandpass filtering. Two filter conditions were tested corresponding roughly to the first and second speech formants: F1 (100-800 Hz) and F2 (800-2200 Hz). Results for F2-filtered target sentences were comparable to those of unfiltered speech, yielding a BCMP of roughly 2-3 dB. Results for F1-filtered target sentences showed a significantly smaller BCMP of approximately 0.7 dB. These results suggest that the magnitude of the BCMP depends on both the spectral and temporal properties of the target speech signal.
Original language | English |
---|---|
Pages (from-to) | 2272-2275 |
Number of pages | 4 |
Journal | Journal of the Acoustical Society of America |
Volume | 109 |
Issue number | 5 I |
DOIs | |
State | Published - 2001 |
Externally published | Yes |