This repo holds code for TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation


1. Download Google pre-trained ViT models

wget{MODEL_NAME}.npz &&
mkdir ../model/vit_checkpoint/imagenet21k &&
mv {MODEL_NAME}.npz ../model/vit_checkpoint/imagenet21k/{MODEL_NAME}.npz

2. Prepare data

Please go to “./datasets/” for details, or please send an Email to jienengchen01 AT to request the preprocessed data. If you would like to use the preprocessed data, please use it for research purposes and do not redistribute it.

3. Environment

Please prepare an environment with python=3.7, and then use the command “pip install -r requirements.txt” for the dependencies.

4. Train/Test

  • Run the train script on synapse dataset. The batch size can be reduced to 12 or 6 to save memory (please also decrease the base_lr linearly), and both can reach similar performance.
CUDA_VISIBLE_DEVICES=0 python --dataset Synapse --vit_name R50-ViT-B_16
  • Run the test script on synapse dataset. It supports testing for both 2D images and 3D volumes.
python --dataset Synapse --vit_name R50-ViT-B_16



  title={TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation},
  author={Chen, Jieneng and Lu, Yongyi and Yu, Qihang and Luo, Xiangde and Adeli, Ehsan and Wang, Yan and Lu, Le and Yuille, Alan L., and Zhou, Yuyin},
  journal={arXiv preprint arXiv:2102.04306},


View Github