Dlib and Mediapipe libraries for tracking face keypoints

Face tracking is used in augmented reality, medicine, marketing and security applications. The article describes two libraries that allow the detection of key points of the face in real time.

Dlib

The main language of Dlib is C++, so to use it, you must first install CMake and Visual Studio, and then use pip to install CMake, dlib and OpenCV. In addition to Dlib, the script will have to import OpenCV to get an image from a webcam. To track a person, you will need three objects:

a detector for detecting one or more faces;
predictor for detecting key points on faces. As a parameter, you need to specify the location of the pre-trained model, which can be downloaded from the link;
the cv2 object.VideoCapture for capturing images from a webcam.

The key point detection model is based on regression trees and trained on an iBUG-300 W dataset containing images of faces with 68 control points. The result of the code working with Dlib is shown in the image above.

Mediapipe

Mediapipe is developed by Google and allows you to solve tasks such as face recognition, posture assessment, object detection and much more. The advantage of this library is that it can be used in web applications and on smartphones. For face tracking, the BlazeFace model is used, optimized for devices with weak technical characteristics.

Unlike Dlib, BlazeFace builds 468 key points simultaneously on several faces and gives an estimate of their coordinates in three-dimensional space. Example of Mediapipe operation: