Re-evaluating the role of visual cues in speech perception

Seeing a speaker’s lips move can help us understand speech, but perhaps not as much as once thought.
Speech perception is an essential function allowing us to communicate effectively. The auditory system plays the major role in understanding spoken language, but visual cues such as lip movements provide additional support, especially in noisy or otherwise difficult listening environments. Researchers from Lancaster University and the University of Manchester have examined how people process speech when both auditory and visual components are available.
Speech perception involves distinguishing small linguistic units known as phonemes. These can be difficult to identify when background noise is present, making listeners rely more on lip-reading and other visual cues.
While previous studies have shown that visual cues significantly improve speech comprehension in noisy environments, this study looked at the influence of visemes. Visemes are groups of visually equivalent phonemes that characterise the position of the face and mouth when saying a word. The study investigated whether speech understanding improves when considering these visually distinct speech sounds.
Sixty participants entered the online study and were presented with speech sounds under different conditions - some with only auditory input; others with both auditory and visual input. The stimuli included three distinct phonemes, ‘Ba’, ‘Fa’, and ‘Ka’, played either in a clear condition or with background noise simulating a real-world environment. The visual and auditory signals were altered by introducing variable degrees of delay - known as stimulus onset asynchrony (SOA) - to see whether synchronisation affects comprehension.
The results confirmed other studies showing speech perception is more difficult in noisy conditions, but the benefit of visual information was weaker than reported previously. The study also revealed that different visemes affect perception differently. Participants found it harder to recognise ‘Ba’ than ‘Ka’ in noisy conditions, suggesting some phonemes are more resistant to noise-related confusion than others.
When examining SOA effects, reaction times were increased as visual cues were delayed relative to the audio, but overall accuracy remained unaffected. This suggests audiovisual integration aids speech perception but does not always translate into better comprehension.
Senior Research Associate Brandon O’Hanlon, lead author of the study, said:
“It is important to reconsider our approach to understanding how we combine our sight and hearing when trying to process speech in difficult listening conditions. If we could understand why and how some units of speech are more resilient to the deficits of noise than others, we may be able to develop more efficient ways to enhance speech perception with assistive auditory technology.”
These findings challenge the idea visual cues automatically enhance speech perception. Instead, they highlight the complexity of audiovisual integration and suggest its benefits may depend on specific phoneme characteristics and real-world conditions.
The results may prove relevant for improving language learning and developing assistive communication technologies, ultimately producing better designs for hearing aids and speech recognition software. Future research may explore a broader range of phonemes and investigate audiovisual integration in more naturalistic settings, such as full sentences instead of isolated syllables.
The full paper can be found here.
Back to News