[OVSeg] Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP

This is the official PyTorch implementation of our paper: Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP Feng Liang, Bichen Wu, Xiaoliang Dai, Kunpeng Li, Yinan Zhao, Hang Zhang, Peizhao Zhang, Peter Vajda, Diana Marculescu

[arXiv] [Project]


Please see installation guide.

Data Preparation

Please see datasets preparation.

Getting started

Please see getting started instruction.


Shield: CC BY-NC 4.0

The majority of OVSeg is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

CC BY-NC 4.0

However portions of the project are under separate license terms: CLIP and ZSSEG are licensed under the MIT license; MaskFormer is licensed under the CC-BY-NC; openclip is licensed under the license at its repo.

Citing OVSeg 🙏

If you use OVSeg in your research or wish to refer to the baseline results published in the paper, please use the following BibTeX entry.

  title={Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP},
  author={Liang, Feng and Wu, Bichen and Dai, Xiaoliang and Li, Kunpeng and Zhao, Yinan and Zhang, Hang and Zhang, Peizhao and Vajda, Peter and Marculescu, Diana},
  journal={arXiv preprint arXiv:2210.04150},


View Github