What is a Deepfake Video?

Deepfake is an AI-based technology used to superimpose computer-generated faces and voices over existing video content, thereby creating a new video that represents actions that never occurred.

The term “Deepfake” was coined after a Reddit user known as “deepfakes”, who, in December 2017, used deep learning technology to edit the faces of celebrities onto people in pornographic video clips.

Deepfake pioneer Hao Li appeared on CNBC (CNBC, September 20, 2019) and said, “Manipulated images and videos that appear perfectly real will be accessible to everyday people in 2020. Even today, a deepfake app in China has exploded in popularity, enabling people replace their face over celebrity faces in films.”

Creating a deepfake uses the following technologies:

  • Object detection
  • Neural Networks/Machine Learning
  • Video rendering software

Object Detection

Deepfakes start with the same technology that your camera uses to detect faces in photographs.

It’s called the Viola-Jones object detection framework.

This technology is a pattern recognition method that recognizes faces by examining the pixels in a photograph and identifying the following elements:

  • Bridge of nose
  • Eyes
  • Mouth
  • Ears
  • Edge of face

Once the face has been mapped, a computer system can then make changes to each individual video frame based on predefined rules. The most simplistic examples of this technology are those found in Instagram Lenses and face-swapping apps.

Neural Networks / Machine Learning

Neural networks recognize patterns in your data. Once the neural network has been trained on samples of your data, it can make predictions by detecting similar patterns in future data. For example:

Learning data of person A

Loss: 0.01624

Video Source: https://www.youtube.com/watch?v=j-pJzQJJiUs

Learning data of person B

Loss: 0.01948

Video Source: https://www.youtube.com/watch?v=I3wSSShwwwo

In this case, the neural network has to be trained to recognize how the appearance of a face changes based on viewing angle, lighting, facial expression, emotional state, and speech mechanics.

The software learns not only how the face changes, but also how to deconstruct and reconstruct the face. The more data the network is exposed to the more it learns.

The task of the neural network is to encode an image from person A, and reconstruct one with similar features to resemble another person B. The complexity of such a task largely depends on how different the two faces are.

The creators of deepfake videos will often use an actor with similar facial shape and skin tone to make it easier to create the newly constructed face.

Once the network is trained, the model is used to make predictions about new conditions using Video Rendering Software.

Video Rendering Software

In the final stage, each individual video frame is modified using the predictive data generated by the neural network. These generated images are then superimposed, frame by frame, over the existing video to generate a new video.

The comedian Bill Hader, and his voice impression skills, has been willingly involved in deepfake videos on TV. For example, here is Deepfake of Bill Hader as Arnold Schwarzenegger and another Deepfake of Bill Hader being Al Pacino.

RigidVid’s technology can prove authenticity of your video and protect your integrity before it’s too late.