BSTRO: Body-Scene contact TRansfOrmer

This is the code repository for Capturing and Inferring Dense Full-BodyHuman-Scene Contact.

Body-Scene contact TRansfOrmer (BSTRO) is a transformer-based method that detects human-scene contact directly from pixels. In this repository, we provide the inference code of BSTRO.


Check for installation instructions.

Pre-trained models and other required files

Please download our pre-trained weights from the website and follow to prepare other relevant files that are important to run our code.

Quick demo

We provide demo codes to run end-to-end inference on the test images.

Check for details.


If you find our work useful in your research, please consider citing:

title = {Capturing and Inferring Dense Full-Body Human-Scene Contact},
author = {Huang, Chun-Hao P. and Yi, Hongwei and H{\"o}schle, Markus and Safroshkin, Matvey and Alexiadis, Tsvetelina and Polikovsky, Senya and Scharstein, Daniel and Black, Michael J.},
booktitle = {IEEE/CVF Conf.~on Computer Vision and Pattern Recognition (CVPR) },
pages = {13274-13285},
month = jun,
year = {2022},
month_numeric = {6}

[TBD] License

Our research code is released under the MPI license. See LICENSE for details.

METRO has MIT license. See LICENSE for details.

We use huggingface/transformers submodule. Please see NOTICE for details.


Our implementation and experiments are built on top of open-source GitHub repositories. We thank all the authors who made their code public, which tremendously accelerates our project progress. If you find these works helpful, please consider citing them as well.






For questions, please contact [email protected]

For commercial licensing (and all related questions for business applications), please contact [email protected].


View Github