Coral PoseNet

Pose estimation refers to computer vision techniques that detect human figures in images and video, so that one could determine, for example, where someone’s elbow, shoulder or foot show up in an image. PoseNet does not recognize who is in an image, it is simply estimating where key body joints are.

This repo contains a set of PoseNet models that are quantized and optimized for use on Coral's Edge TPU, together with some example code to shows how to run it on a camera stream.

Why PoseNet ?

Pose estimation has many uses, from interactive installations that react to the body to augmented reality, animation, fitness uses, and more. We hope the accessibility of this model inspires more developers and makers to experiment and apply pose detection to their own unique projects, to demonstrate how machine learning can be deployed in ways that are anonymous and private.

How does it work ?

At a high level pose estimation happens in two phases:

  1. An input RGB image is fed through a convolutional neural network. In our case
    this is a MobileNet V1 architecture. Instead of a classification head however,
    there is a specialized head which produces a set of heatmaps (one for each kind
    of key point) and some offset maps. This step runs on the EdgeTPU. The results
    are then fed into step 2)

  2. A special multi-pose decoding algorithm is used to decode poses, pose
    confidence scores, keypoint positions, and keypoint confidence scores. Note that
    unlike in the TensorflowJS

    we have created a custom OP in Tensorflow Lite and appended it to the network
    graph itself. This CustomOP does the decoding (on the CPU) as a post processing
    step. The advantage is that we don't have to deal with the heatmaps directly and
    when we then call this network through the Coral Python API we
    simply get a series of keypoints from the network.

If you're interested in the gory details of the decoding algorithm and how
PoseNet works under the hood, I recommend you take a look at the original
research paper or this medium

whihch describes the raw heatmaps produced by the convolutional model.

Important concepts

Pose: at the highest level, PoseNet will return a pose object that contains a
list of keypoints and an instance-level confidence score for each detected person.

Keypoint: a part of a person’s pose that is estimated, such as the nose,
right ear, left knee, right foot, etc. It contains both a position and a
keypoint confidence score. PoseNet currently detects 17 keypoints illustrated
in the following diagram:

pose keypoints

Keypoint Confidence Score: this determines the confidence that an estimated
keypoint position is accurate. It ranges between 0.0 and 1.0. It can be used to
hide keypoints that are not deemed strong enough.

Keypoint Position: 2D x and y coordinates in the original input image where
a keypoint has been detected.

Examples in this repo

NOTE: PoseNet relies on the latest Coral API (2.11.1) - please update your
system before running these examples. For more information on updating see:

To install all the requirements, simply run


A minimal example that simply downloads an image, and prints the pose


A camera example that streams the camera image through posenet and
draws the pose on top as an overlay. This is a great first example to run to
familiarize yourself with the network and its outputs.

Run a simple demo like this:


If the camera and monitor are both facing you, consider adding the --mirror flag:

python3 --mirror

In this repo we have included 3 posenet model files for differnet input
resolutions. The larger resolutions are slower of course, but allow a wider
field of view, or further-away poses to be processed correctly.


You can change the camera resolution by using the --res parameter:

python3 --res 480x360  # fast but low res
python3 --res 640x480  # default
python3 --res 1280x720 # slower but high res

A fun little app that demonstrates how Coral and PoseNet can be used to analyze
human behavior in an anonymous and privacy-preserving way.

Posenet converts an image of a human into a mere skeleton which captures its
position and movement over time, but discards any precisely identifying features
and the original camera image.
Because Coral devices run all the image analysis
locally, the actual image is never streamed anywhere and is immediately
discarded. The poses can be safely stored or analysed.

For example a store owner
may want to study the bahavior of customers as they move through the store, in
order to optimize flow and improve product placement.
A museum may want to track which areas are most busy, at which times such as to
give guidance which exhibits may currently have the shortest waiting times.

With Coral this is possible without recording anybody's image directly or
streaming data to a cloud service - instead the images are immediately

The anaonymizer is a small app that demonstrates this is a fun way.
To use the anonymizer set up your camera in a sturdy position. Lauch the app
and walk out of the image. This demo waits until no one is in the frame, then
stores the 'background' image. Now, step back in. You'll see your
current pose overlayed over a static image of the background.


(If the camera and monitor are both facing you, consider adding the --mirror flag.)


This demo allows people to control musical synthesizers with their arms. Up to
3 people are each assigned a different instrument and octave, and control the
pitch with their right wrists and the volume with their left wrists.

You'll need to install FluidSynth and a General Midi SoundFont:

apt install fluidsynth fluid-soundfont-gm
pip3 install pyfluidsynth

Now you can run the demo like this:


The PoseEngine class

The PoseEngine class (defined in allows easy access
to the PoseNet network from Python, using the EdgeTPU API.

You simply initialize the class with the location of the model .tflite
file and then call DetectPosesInImage, passing a numpy object that contains the
image. The numpy object should be in int8, [Y,X,RGB] format.

A minimal example might be:

import numpy as np
from PIL import Image
from pose_engine import PoseEngine

pil_image ='couple.jpg')
pil_image.resize((641, 481), Image.NEAREST)

engine = PoseEngine('models/posenet_mobilenet_v1_075_481_641_quant_decoder_edgetpu.tflite')
poses, inference_time = engine.DetectPosesInImage(np.uint8(pil_image))
print('Inference time: %.fms'%inference_time)

for pose in poses:
    if pose.score < 0.4: continue
    print('\nPose Score: ', pose.score)
    for label, keypoint in pose.keypoints.items():
        print(' %-20s x=%-4d y=%-4d score=%.1f'%
              (label, keypoint.yx[1], keypoint.yx[0], keypoint.score))

To try this, run


And you should see an output like this:

Pose Score:  0.61885977
 nose                 x=210  y=152  score=1.0
 left eye             x=224  y=138  score=1.0
 right eye            x=199  y=136  score=1.0
 left ear             x=245  y=135  score=1.0
 right ear            x=183  y=129  score=0.8
 left shoulder        x=268  y=168  score=0.8
 right shoulder       x=161  y=172  score=1.0
 left elbow           x=282  y=255  score=0.6
 right elbow          x=154  y=254  score=0.9
 left wrist           x=236  y=333  score=0.7
 right wrist          x=163  y=301  score=0.6
 left hip             x=323  y=181  score=0.2
 right hip            x=191  y=251  score=0.0
 left knee            x=343  y=84   score=0.8
 right knee           x=162  y=295  score=0.0
 left ankle           x=318  y=174  score=0.1
 right ankle          x=167  y=309  score=0.0