PyTorch implementation for AAAI2021 paper of “Similarity Reasoning and Filtration for Image-Text Matching”.

It is built on top of the SCAN and Cross-modal_Retrieval_Tutorial.

We have released two versions of SGRAF: Branch main for python2.7; Branch python3.6 for python3.6.


The framework of SGRAF:

The updated results (Better than the original paper)

Dataset Module Sentence retrieval Image retrieval
R@1R@5R@10 R@1R@5R@10
Flick30k SAF 75.692.796.9 56.582.088.4
SGR 76.693.796.6 56.180.987.0
SGRAF 78.494.697.5
MSCOCO1k SAF 78.095.998.5 62.289.595.4
SGR 77.396.098.6 62.189.695.3
SGRAF 79.296.598.6 63.590.295.8
MSCOCO5k SAF 55.583.891.8 40.169.780.4
SGR 57.383.290.6 40.569.680.3
SGRAF 58.884.892.1 41.670.981.5


We recommended the following dependencies for Branch main.

import nltk
> d punkt

Download data and vocab

We follow SCAN to obtain image features and vocabularies, which can be downloaded by using:


Pre-trained models and evaluation

Modify the model_path, data_path, vocab_path in the file. Then run


Note that fold5=True is only for evaluation on mscoco1K (5 folders average) while fold5=False for mscoco5K and flickr30K. Pretrained models and Log files can be downloaded from Flickr30K_SGRAF and MSCOCO_SGRAF.

Training new models from scratch

Modify the data_path, vocab_path, model_name, logger_name in the file. Then run


(For SGR) python --data_name coco_precomp --num_epochs 20 --lr_update 10 --module_name SGR
(For SAF) python --data_name coco_precomp --num_epochs 20 --lr_update 10 --module_name SAF

For Flickr30K:

(For SGR) python --data_name f30k_precomp --num_epochs 40 --lr_update 30 --module_name SGR
(For SAF) python --data_name f30k_precomp --num_epochs 30 --lr_update 20 --module_name SAF


If SGRAF is useful for your research, please cite the following paper:

  title={Similarity Reasoning and Filtration for Image-Text Matching},
  author={Diao, Haiwen and Zhang, Ying and Ma, Lin and Lu, Huchuan},