Google’s Researchers developed a new Artificial Intelligence tool which can identify voice of an individual even in the crowd
A new technology has knocked the door. Now an Artificial Intelligence will identify the voice of any person. As smartphones can focus on a particular object when many are present, just like that this new technology has evolved.
Now it will be possible to identify voice of an individual by suppressing all other sounds in the surroundings. Only humans were able to do this earlier but now a machine will do this. The ability to mentally mute all other sounds, except the one, comes naturally to humans. This is called “Cocktail Party Effect”.
Automatic Speech Separation
According to Inbar Mosseri and Oran Lang, software engineers at Google Research, Automatic Speech separation is one of the major challenge for computers. In automatic speech separation, computer separates an audio signal into its individual speech sources.
Researchers have presented a new paper which presents a deep learning audio-visual model. In this model, researchers have isolated a single speech from a mixture of voices and noise present in the background.
“In this work, we are able to computationally produce videos in which speech of specific people is enhanced while all other sounds are suppressed,” Mosseri and Lang said.
See Also: Apple to release Red iPhone 8 and 8 Plus
This method works only on those ordinary videos which contain a single audio track. A user is to select the face of a person in the video which he/she wants to hear.
This capability has a wide range of applications.Applications ranges from speech enhancement and recognition in videos, through video conferencing, to improved hearing aids. It is applicable especially in situations where there are multiple people speaking.
Researchers said,”A unique aspect of our technique is in combining both the auditory and visual signals of an input video to separate the speech,”.
“Intuitively, movements of a person’s mouth, for example, should correlate with the sounds produced as that person is speaking, which in turn can help identify which parts of the audio correspond to that person,” they explained.