Scientists train AI to generate digital face images inspired entirely by the voice of the speaker
Marie Donlon | June 11, 2019A team of scientists from the Massachusetts Institute of Technology (MIT) Computer Science and Artificial Intelligence Laboratory (CSAIL) has developed artificial intelligence (AI) capable of generating digital images of a face based entirely on brief audio clips of a person’s voice.
Called Speech2Face, the system is a neural network — a series of algorithms designed to recognize patterns and to work much like the human brain. The research team trained the algorithm using millions of online educational videos that featured roughly 100,000 different people talking. Based on that dataset, Speech2Face was able to make connections between vocal cues and specific facial features on the speaker’s face. During testing, the AI algorithm generated photorealistic digital faces to match speakers' voices from the audio clips.
According to the team, the AI algorithm cannot yet produce a definitive image of a person based on voice alone, but it can reportedly identify specific markers in speech suggestive of age, ethnicity and gender. Consequently, Speech2face only generates generic, forward-facing faces with neutral expressions and not the actual faces of the individual speakers featured on the audio clips. However, the team explained that the AI typically captures the correct age ranges, genders and ethnicities of those speaking in the audio clips.
However, the AI algorithm's accuracy declined when presented with variations in language. For instance, when listening to audio clips of an Asian man speaking Chinese, the image generated was that of an Asian face. When the same individual spoke English in another audio clip, the AI produced the face of a white man. Similarly, the AI demonstrated gender bias, pairing low-pitched voices with the faces of males and high-pitched voices with the faces of females.
Although still in its infancy, AI is being applied in a number of unexpected industries including the arts, food and beverage and law enforcement — this despite many reports that the technology is still flawed.
The team of scientists published their work in the journal arXiv.
"Similarly, the AI demonstrated gender bias, pairing low-pitched voices with the faces of males and high-pitched voices with the faces of females."
Is this a defect? Most men's voices are lower than female voices. It sounds to me that the neural network figured that out and it's working like it's supposed to.
Just trying to figure out why they would want to do this. Who would use such a product and for what? I can only imagine that call centers would use it to have a fake picture of the person they cold called. That way they can call you "Mr."/Sir or "Mrs./Miss/Ms" and tailor their pitch.
Of course if you are a man with a high pitched voice or a woman with a low voice, you will have the enjoyment of being wrong gendered by the pests.
Voice is hardly used anymore on the phone, except by people who shouldn't be calling you in the first place texting is the norm, with a few exceptions such as my old mother... and she doesn't need a fake pic to know it's me calling.