What is a Deepfake Video?
Deepfake is an AI-based technology used to superimpose computer-generated faces and voices over existing video content, thereby creating a new video that represents actions that never occurred.
The term “Deepfake” was coined after a Reddit user known as “deepfakes”, who, in December 2017, used deep learning technology to edit the faces of celebrities onto people in pornographic video clips.
Deepfake pioneer Hao Li appeared on CNBC (CNBC, September 20, 2019) and said, “Manipulated images and videos that appear perfectly real will be accessible to everyday people in 2020. Even today, a deepfake app in China has exploded in popularity, enabling people replace their face over celebrity faces in films.”
Creating a deepfake uses the following technologies:
- Object detection
- Neural Networks/Machine Learning
- Video rendering software
Deepfakes start with the same technology that your camera uses to detect faces in photographs.
It’s called the Viola-Jones object detection framework.
This technology is a pattern recognition method that recognizes faces by examining the pixels in a photograph and identifying the following elements:
- Bridge of nose
- Edge of face
Once the face has been mapped, a computer system can then make changes to each individual video frame based on predefined rules. The most simplistic examples of this technology are those found in Instagram Lenses and face-swapping apps.
Neural Networks / Machine Learning
Neural networks recognize patterns in your data. Once the neural network has been trained on samples of your data, it can make predictions by detecting similar patterns in future data. For example:
In this case, the neural network has to be trained to recognize how the appearance of a face changes based on viewing angle, lighting, facial expression, emotional state, and speech mechanics.
The software learns not only how the face changes, but also how to deconstruct and reconstruct the face. The more data the network is exposed to the more it learns.
The task of the neural network is to encode an image from person A, and reconstruct one with similar features to resemble another person B. The complexity of such a task largely depends on how different the two faces are.
The creators of deepfake videos will often use an actor with similar facial shape and skin tone to make it easier to create the newly constructed face.
Once the network is trained, the model is used to make predictions about new conditions using Video Rendering Software.