Evaluating the articulation index for auditory-visual input

Ken W. Grant, Louis D. Braida

Research output: Contribution to journalArticlepeer-review

108 Scopus citations


An investigation of the auditory-visual (AV) articulation index (AI) correction procedure outlined in the ANSI standard [ANSI S3.5-1969 (R1986)] was made by evaluating auditory (A), visual (V), and auditory-visual sentence identification for both wideband speech degraded by additive noise and a variety of bandpass-filtered speech conditions presented in quiet and in noise. When the data for each of the different listening conditions were averaged across talkers and subjects, the procedure outlined in the standard was fairly well supported, although deviations from the predicted AV score were noted for individual subjects as well as individual talkers. For filtered speech signals with Aia< 0.25, there was a tendency for the standard to underpredict AV scores. Conversely, for signals with Aia> 0.25, the standard consistently overpredicted AV scores. Additionally, synergistic effects, where the Aia obtained from the combination of different bandpass-filtered conditions was greater than the sum of the individual Aia*S, were observed for all nonadjacent filter-band combinations (e.g., the addition of a low-pass band with a 630-Hz cutoff and a high-pass band with a 3150-Hz cutoff). These latter deviations from the standard violate the basic assumption of additivity stated by Articulation Theory, but are consistent with earlier reports by Pollack [I. Pollack, J. Acoust. Soc. Am. 20, 259–266 (1948)], Licklider [J. C. R. Licklider, Psychology: A Study of a Science, Vol. 1, edited by S. Koch (McGraw-Hill, New York, 1959), pp. 41–144], and Kryter [K. D. Kryter, J. Acoust. Soc. Am. 32, 547–556 (I960)].

Original languageEnglish
Pages (from-to)2952-2960
Number of pages9
JournalJournal of the Acoustical Society of America
Issue number6
StatePublished - Jun 1991
Externally publishedYes


Dive into the research topics of 'Evaluating the articulation index for auditory-visual input'. Together they form a unique fingerprint.

Cite this