Real-time 3D Multi-person Pose Estimation Demo

This repository contains 3D multi-person pose estimation demo in PyTorch. Intel OpenVINO™ backend can be used for fast inference on CPU. This demo is based on Lightweight OpenPose and Single-Shot Multi-Person 3D Pose Estimation From Monocular RGB papers. It detects 2D coordinates of up to 18 types of keypoints: ears, eyes, nose, neck, shoulders, elbows, wrists, hips, knees, and ankles, as well as their 3D coordinates. It was trained on MS COCO and CMU Panoptic datasets and achieves 100 mm MPJPE (mean per joint position error) on CMU Panoptic subset. This repository significantly overlaps with, however contains just the necessary code for 3D human pose estimation demo.


  • Python 3.5 (or above)
  • CMake 3.10 (or above)
  • C++ Compiler (g++ or MSVC)
  • OpenCV 4.0 (or above)

[Optional] Intel OpenVINO for fast inference on CPU.


  1. Install requirements:
pip install -r requirements.txt
  1. Build pose_extractor module:
python build_ext
  1. Add build folder to PYTHONPATH:
export PYTHONPATH=pose_extractor/build/:$PYTHONPATH

Pre-trained model

Pre-trained model is available at Google Drive.


To run the demo, pass path to the pre-trained checkpoint and camera id (or path to video file):

python --model human-pose-estimation-3d.pth --video 0

Camera can capture scene under different view angles, so for correct scene visualization, please pass camera extrinsics and focal length with --extrinsics and --fx options correspondingly (extrinsics sample format can be found in data folder). In case no camera parameters provided, demo will use the default ones.

Inference with OpenVINO

To run with OpenVINO, it is necessary to convert checkpoint to OpenVINO format:

  1. Set OpenVINO environment variables:
    source <OpenVINO_INSTALL_DIR>/bin/
  2. Convert checkpoint to ONNX:
    python scripts/ --checkpoint-path human-pose-estimation-3d.pth
  3. Convert to OpenVINO format:
    python <OpenVINO_INSTALL_DIR>/deployment_tools/model_optimizer/ --input_model human-pose-estimation-3d.onnx --input=data --mean_values=data[128.0,128.0,128.0] --scale_values=data[255.0,255.0,255.0] --output=features,heatmaps,pafs

To run the demo with OpenVINO inference, pass --use-openvino option and specify device to infer on:

python --model human-pose-estimation-3d.xml --device CPU --use-openvino --video 0