Mediapipe – made by Google

Reading Time: 2 minutes

Mediapipe makes Machine Learning faster, easier and cheaper. Basically, it is a versatile library developed by Google that is capable of understanding video and photo input. Mediapipe can detect faces, track your moves and gestures, find objects and know the dimensions of an object. Besides that, it may be used by developers of all kinds since it supports various programming languages. 

Quite an interesting area may be the way Mediapipe is built. It is actually a mix of other algorithms that are combined into one. Let us analyse one example. When Mediapipe detects your hand, it projects 24 so-called landmarks on it. Those are key points on your hand – joints. Those are points where you bend your fingers or simply rotate the hand. The Mediapipe algorithm is taught with thousands of hand images to detect the right landmarks on both moving and stable hands. Then, landmarks are put in a three-dimensional graph where they represent the location of the hand’s key points relative to each other. If a developer wants to use Medipipe for his own projects, then it is a great choice in terms of agility and efficiency. Having an output that is a list of 21 X,Y,Z coordinates is way computationally cheaper than having a list of every pixel of the particular image. 

Hands - mediapipe

If a developer wants to translate sign language to English written text, then it will need a huge dataset of pictures representing every sign. On average, only one data point would be a list of at least 2500 pixel values (for 50×50 images). Machine Learning algorithms would need to make expensive calculations with every data point having 2500 float values (minimum). However, it is possible to have 63 values within a data point with Mediapipe. No matter how big the image is. When dealing with a dataset of images, it can be easily “translated” into 63 coordinates representing the position of your hand. 63 coordinates is the number of landmarks (21) multiplied by the number of axes we use (X,Y,Z). At the end of the day, AI enthusiasts will have a small dataset that fully represents the hand gesture with a help of only 63 numbers which makes it really cheap to run a Machine Learning algorithm on such data. 

Hands - mediapipe

This outstanding approach allows us to keep the knowledge we possess, but make it more compact and comfy for further exploration or development. This is only one of 8 great features that Mediapipe has. And this is truly inspiring.

Link to the documentation: https://google.github.io/mediapipe/

Leave a Reply