Lip reading by artificial intelligence

Reading Time: < 1 minute

Machines can already recognize words without humans saying them. How did the scientists manage to make such an achievement?

Google's AI can now lip read better than humans after watching thousands of  hours of TV - The Verge

https://www.theverge.com/2016/11/24/13740798/google-deepmind-ai-lip-reading-tv

Lip reading is not an easy skill to learn. It is easier by knowing the context of the speech and the person’s pronunciation style. LipNet  artificial intelligence system was developed by a team of specialists from the University of Oxford. This system has a dataset (GRID) consisting of 3-second clips  showing people  reading  different sequences of words (faces are well lit, facing the camera, pronunciation correct). 

How does the system translate its “knowledge” into practice? 

A team of researchers used the dataset to train  artificial intelligence to identify variation in oral shapes and link observed changes to meaning. AI analyzes all the material (not excerpts) to capture context.  This is important because there are fewer lip arrangements than speech sounds  (i.e., one lip arrangement may represent several sounds, but the system matches its “unknowns” to the context).  In the testing phase, the system was able to identify 93.4 percent of the words. When people were asked to perform this task, they found that they were only able to recognize  an average of 52.3 percent of the words! 

Another team from the Department of Engineering at the Univeristy of Oxford, in collaboration with Google DeepMind, took on an even more difficult task. Instead of a “rigid” dataset – such as GRID – they used 100,000 video clips from BBC television as a base. 

At the testing stage, the system developed by a team from Oxford and Google DeepMind was able to recognize 46.8 percent of words (using less well-lit recordings, situations in which the face is not in the center). In comparison – people identified only 12.4 percent of words. 

Deaf people will certainly benefit from the new technology. The system will be useful in everyday situations (to read speech from video recorded without sound, to fill in the gaps when the  voice of our interlocutor on Skype starts to disappear in the noise). 

https://www.ox.ac.uk/news/science-blog/could-oxford-developed-software-help-solve-tricky-problem-lip-reading

https://abilitynet.org.uk/lipreading-google-deepmind-future-disabled

Leave a Reply