Share the post "NVIDIA’s Deepfake Eye Contact Effect: Giving Videos an Extra Touch of Reality"

In video conferencing technology, clear audio and visual quality are essential for various streaming use cases such as vlogging, tubing, webcasting, and remote work to increase the sense of presence and pick up on both verbal and nonverbal cues.
Eye contact plays a vital role in social interactions and face-to-face conversations. It signifies confidence, connection, and attention. However, maintaining eye contact is not always possible in video conferencing scenarios. It requires users to look directly into the camera instead of the computer screen, which can be challenging when reading off a script or reviewing data on the computer. Additionally, maintaining eye contact can be difficult for various physiological reasons, and many children and adults find it challenging to make and maintain eye contact.
Fortunately, a solution to this problem has been discovered. Nvidia has created a new feature called Maxine Eye Contact that enhances the user experience during video chats and webcasts by making the user appear to be looking directly into the camera, even if they are gazing at their notes or through the window. The Eye Contact effect is a new feature that uses technology to make it appear as though the person speaking is making direct eye contact with the camera. This is done by aligning the gaze and keeping the natural color and blinking of the eyes. Additionally, there is a disconnect feature that allows for a smooth transition back to the person’s real eyes if they look away from the camera.

Understanding the Functionality of Creating an Eye Contact Pipeline
The process of creating an eye contact pipeline using NVIDIA Maxine involves using the Face tracking feature to identify and analyze the region around the eyes, known as the “eye patch.” This is done by aligning the face, extracting the eye patch, inputting it into a specialized network that separates encoding and decoding stages to adjust the gaze direction to make the face appear to be looking forward, and then blending it back into the original video frame. The output includes head position, gaze angles, and an image with the corrected gaze direction. The pipeline can also be used to simply estimate gaze direction without making any adjustments.
The Maxine Eye Contact Model from NVIDIA
The NVIDIA Maxine Eye Contact model architecture uses a transformer-based encoder and decoder structure to adjust the gaze in an image. It separates the image into different factors such as lighting, face shape, and gaze direction, predicts the rotation angles for each of these factors, applies them to the image, and then produces the final redirected eye image.
Maintaining Eye Color During Gaze Redirection
NVIDIA’s eye contact network uses multiple loss functions, such as reconstruction loss, functional loss, and disentanglement loss, to ensure accurate gaze redirection while preserving eye color. The network is trained on a diverse dataset, including synthetic images, to maintain a wide range of eye colors in the generated images. The reconstruction loss function compares the generated image to the target image, functional loss prioritizes task-relevant inconsistencies such as mismatch in iris positions, and disentanglement loss encourages the separation of environmental and physical factors to avoid altering other factors in the redirected image.
Creating a functioning range
The input to the eye contact network is a scale-normalized eye patch. It has been found that the network can perform reliable and natural gaze redirection within a 20-degree pitch and yaw angle cone, which is considered the recommended working range for the feature.

Addressing Transitional Drop-Off in Gaze Redirection
To address the issue of sudden shifts in the iris during fast eye movements, NVIDIA has implemented a transition region in our gaze redirection feature. This allows for a smooth transition between the camera angle and the actual gaze angle, by gradually reducing the redirection as the angle gets closer to the estimated gaze angle. The transition is designed to mimic the typical motion of human eyes and the speed of the transition is set accordingly.
Handling the problem of invisibility of the eyes
NVIDIA’s eye contact pipeline can handle instances where a person’s eyes are not visible, such as when they are blinking or obscured by movement or objects. The algorithm can detect and maintain eye blinks and also deactivates the gaze adjustment effect when an occlusion is detected, as indicated by low confidence in the facial landmark estimation.
Optimizing performance
The pipeline uses TensorRT to accelerate performance on GPU, allowing for real-time inference on NVIDIA GPUs with minimal latency per frame. It has been optimized for performance and can handle multiple stream instances simultaneously, making it suitable for data center use cases as well as NVIDIA RTX desktops and laptops.
In conclusion, the NVIDIA deepfake eye contact effect is a powerful technology that can greatly enhance the realism of deepfake videos. By using a neural network to map the gaze of the subject in the original video to the deep fake, the eyes of the deepfake subject appear more natural and lifelike. This technology has the potential to be used in a variety of applications, such as virtual reality, film, and video conferencing. However, it is important to consider the potential ethical implications of deepfake technology, particularly regarding its use in creating fake videos for malicious or deceitful purposes. Overall, the NVIDIA deepfake eye contact effect is an exciting advancement in the field of deepfake technology, but it must be used responsibly.
sources :
Improve Human Connection in Video Conferences with NVIDIA Maxine Eye Contact | NVIDIA Technical Blog
This eye-contact filter is for sure a great technology.Nevertheless, for me it has a negligible effect on the reception of video and is rather an addition than a game-changing invention. As you mentioned in the article, it could create way more realistic deepfake videos which would be unpleasant and overshadow the good sides of this filter.
I have some troubles with keeping eye contact during meetings. This could help me as a listener to look more engaged, and as a speaker, I could read my notes without any worry. For me personally, it’s a great invention. Can’t wait to use it
I think this is very interesting but doesnt it look unnatural looking at the camera consistently. Perhaps there is a way to turn it or or off mid meeting which would be quite helpful.