The current study examines the temporal parameters associated with cross-modal integration of auditory-visual information for sentential material. The speech signal was filtered into 1/3-octave channels, all of which were discarded except for a low-frequency (298-375 Hz) and a high-frequency (4762-6000 Hz) band. The intelligibility of this audio-only signal ranged between 9% and 31% for nine normal-hearing subjects. Visual-alone presentation of the same material ranged between 1% and 22% intelligibility. When the audio and video signals are combined and presented in synchrony, intelligibility climbs to an average of 63%. When the audio signal leads the video, intelligibility declines appreciably for even the shortest asynchrony of 40 ms. Additional increases in video delay result in a progressive decline in intelligibility, reaching a level comparable to that of the audio-alone condition for an asynchrony of 400 ms. In contrast, when the video signal leads the audio, intelligibility remains relatively stable for onset asynchronies up to 160-200 ms. Hence, there is a marked asymmetry in the integration of audio and visual information that has important implications for sensory-based models of auditory-visual speech processing.
|Number of pages||6|
|State||Published - 2001|
|Event||2001 International Conference on Auditory-Visual Speech Processing, AVSP 2001 - Aalborg, Denmark|
Duration: 7 Sep 2001 → 9 Sep 2001
|Conference||2001 International Conference on Auditory-Visual Speech Processing, AVSP 2001|
|Period||7/09/01 → 9/09/01|