Video to Pose3D

Convert video to 3D pose in one-key.


  1. Environment
    • Linux system
    • Python > 3.6 distribution
  2. Dependencies
    • Packages
      • Pytorch > 0.4.0
      • torchsample
      • ffmpeg
      • tqdm
      • pillow
      • scipy
      • pandas
      • h5py
      • visdom
      • nibabel
      • opencv-python (install with pip)
      • matplotlib
    • 2D Joint detectors
      • Alphapose (Recommended)
        • Download duc_se.pth from (Google Drive | Baidu pan),
          place to ./joints_detectors/Alphapose/models/sppe
        • Download yolov3-spp.weights from (Google Drive | Baidu pan),
          place to ./joints_detectors/Alphapose/models/yolo
      • HR-Net (Bad 3d joints performance in my testing environment)
        • Download pose_hrnet* from Google Drive,
          place to ./joints_detectors/hrnet/models/pytorch/pose_coco/
        • Download yolov3.weights from here,
          place to ./joints_detectors/hrnet/lib/detector/yolo
      • OpenPose (Not tested, PR to is highly appreciated )
    • 3D Joint detectors
      • Download pretrained_h36m_detectron_coco.bin from here,
        place it into ./checkpoint folder
    • 2D Pose trackers (Optional)
      • PoseFlow (Recommended)
        No extra dependences
      • LightTrack (Bad 2d tracking performance in my testing environment)
        • See original README, and perform same get started step on ./pose_trackers/lighttrack


  1. place your video into ./outputs folder. (I've prepared a test video).
Single person video
  1. change the video_path in the ./
  2. Run it! You will find the rendered output video in the ./outputs folder.
Multiple person video (Not implemented yet)
  1. For developing, check ./videopose_multi_person

    video = 'kobe.mp4'
    # Run AlphaPose, save the result into ./outputs/alpha_pose_kobe
    # Taking the result from above as the input of PoseTrack, output poseflow-results.json # into the same directory of above. 
    # The visualization result is save in ./outputs/alpha_pose_kobe/poseflow-vis
    # TODO: Need more action:
    #  1. "Improve the accuracy of tracking algorithm" or "Doing specific post processing 
    #     after getting the track result".
    #  2. Choosing person(remove the other 2d points for each frame)
  1. The PyCharm is recommended since it is the IDE I'm using during development.
  2. If you're using PyCharm, mark ./joints_detectors/Alphapose, ./joints_detectors/hrnet and ./pose_trackers as source root.
  3. If your're trying to run in command line, add these paths mentioned above to the sys.path at the head of ./


As this script is based on the VedioPose3D provided by Facebook, and automated in the following way:

args = parse_args()

args.detector_2d = 'alpha_pose'
dir_name = os.path.dirname(video_path)
basename = os.path.basename(video_path)
video_name = basename[:basename.rfind('.')]
args.viz_video = video_path
args.viz_output = f'{dir_name}/{args.detector_2d}_{video_name}.gif'

args.evaluate = 'pretrained_h36m_detectron_coco.bin'

with Timer(video_path):

The meaning of arguments can be found here, you can customize it conveniently by changing the args in ./


The 2D pose to 3D pose and visualization part is from VideoPose3D.

Some of the "In the wild" script is adapted from the other fork.

The project structure and ./ running script is adapted from this repo

Coming soon

The other feature will be added to improve accuracy in the future:

  • [x] Human completeness check.
  • [x] Object Tracking to the first complete human covering largest area.
  • [x] Change 2D pose estimation method such as AlphaPose.
  • [x] Test HR-Net as 2d joints detector.
  • [x] Test LightTrack as pose tracker.
  • [ ] Multi-person video(complex) support.
  • [ ] Data augmentation to solve "high-speed with low-rate" problem: SLOW-MO.