Contextual Non-Local Alignment over Full-Scale Representation for Text-Based Person Search

This is an implementation for our paper Contextual Non-Local Alignment over Full-Scale Representation for Text-Based Person Search. The code is modified from Github repositoty “pytorch implementation for ECCV2018 paper Deep Cross-Modal Projection Learning for Image-Text Matching“.

Requirement

  • Python 3.7
  • Pytorch 1.0.0 & torchvision 0.2.1
  • numpy
  • matplotlib (not necessary unless the need for the result figure)
  • scipy 1.2.1
  • pytorch_transformers

Usage

Data Preparation

  1. Please download CUHK-PEDES dataset .
  2. Put reid_raw.json under project_directory/data/
  3. run data.sh
  4. Copy files test_reid.json, train_reid.json and val_reid.json under CUHK-PEDES/data/ to project_directory/data/processed_data/
  5. Download pretrained Resnet50 model, bert-base-uncased model and vocabulary to project_directory/pretrained/

Training & Testing

You should firstly change the parameter BASE_ROOT to your current directory and IMAGE_DIR to the directory of CUHK-PEDES dataset. Run command sh scripts/train.sh to train the model. Run command sh scripts/test.sh to evaluate the model.

Model Framework

Framework

Model Performance

Performance0 Performance0

GitHub

https://github.com/TencentYoutuResearch/PersonReID-NAFS