AbstractThe integration of visual and auditory cues is crucial for successful processing of speech, especially under adverse conditions. Recent reports have shown that when participants watch muted videos of speakers, the phonological information about the acoustic speech envelope is tracked by the visual cortex. However, the speech signal also carries much richer acoustic details, e.g. about the fundamental frequency and the resonant frequencies, whose visuo-phonological transformation could aid speech processing. Here, we investigated the neural basis of the visuo-phonological transformation processes of these more fine-grained acoustic details and assessed how they change with ageing. We recorded whole-head magnetoencephalography (MEG) data while participants watched silent intelligible and unintelligible videos of a speaker. We found that the visual cortex is able to track the unheard intelligible modulations of resonant frequencies and the pitch linked to lip movements. Importantly, only the processing of intelligible unheard formants decreases significantly with age in the visual and also in the cingulate cortex. This is not the case for the processing of the unheard speech envelope, the fundamental frequency or the purely visual information carried by lip movements. These results show that unheard spectral fine-details (along with the unheard acoustic envelope) are transformed from a mere visual to a phonological representation. Aging affects especially the ability to derive spectral dynamics at formant frequencies. Since listening in noisy environments should capitalize on the ability to track spectral fine-details, our results provide a novel focus on compensatory processes in such challenging situations.Significance statementThe multisensory integration of speech cues from visual and auditory modalities is crucial for optimal speech perception in noisy environments or for elderly individuals with progressive hearing loss. It has already been shown that the visual cortex is able to extract global acoustic information like amplitude modulations from silent visual speech, but whether this extends to fine-detailed spectral acoustic information remains unclear. Here, we demonstrate that the visual cortex is indeed able to extract fine-detailed phonological cues just from watching silent lip movements. Furthermore, this tracking of acoustic fine-details is deteriorating with age. These results suggest that the human brain is able to transform visual information into useful phonological information, and this process might be crucially affected in ageing individuals.