Researchers used machine learning to create an amazing AI that can create eerie videos of people talking starting from a single frame — a picture or even a painting. The ‘talking head’ in the videos follows the motions of a source face (a real person), whose facial landmarks are applied to the facial data of the target face. As you can see in the presentation video below, the target face mimics the facial expressions and verbal cues of the source. This is how the authors brought Einstein, Salvador Dalí, and even Mona Lisa to life using only a photograph.
This sort of application of machine learning isn’t new. For some years, researchers have been working on algorithms that generate videos which swap faces. However, this kind of software required a lot of training data in video form (at least a couple of minutes of content) in order to generate a realistic moving face for the source. Other efforts rendered 3D faces from a single picture, but could not generate motion pictures.
Computer engineers at Samsung’s AI Center in Moscow took it to the next level. Their artificial neural network is capable of generating a face that turns, speaks, and can make expressions starting from only a single image of a person’s face. The researchers call this technique “single-shot learning”. Of course, the end result looks plainly doctored, but the life-like quality increases dramatically when the algorithm is trained with more images or frames.
The authors also employed Generative Adversarial Networks (GAN) — deep neural net architectures comprised of two nets, pitting one against the other. Basically, each model tries to outsmart the other by creating the appearance of something “real”. This competition promotes a higher level of realism.
If you pay close attention to the outputted faces, you’ll notice that they’re not perfect. There are artifacts and weird bugs that call out the fakeness. That being said this is surely some very impressive work. The next obvious step is making Mona Lisa move her lower body as well. In the future, she might dance for the first time in hundreds of years — or her weird AI avatar, at least.
The work was documented in the preprint server Arxiv.
Was this helpful?