MONet in PyTorch

We provide a PyTorch implementation of MONet.

This project is built on top of the CycleGAN/pix2pix code written by Jun-Yan Zhu and Taesung Park, and supported by Tongzhou Wang.

Note: The implementation is developed and tested on Python 3.7 and PyTorch 1.1.

Implementation details

Decoder Negative Log-Likelihood (NLL) loss

where *N* is the number of pixels in the image, and *K* is the number of mixture components.

Test Results

CLEVR 64x64 @ 160 epochs


  • Linux or macOS (not tested)
  • Python 3.7

Getting Started


  • Clone this repo:
git clone
cd MONet-pytorch
  • Install [PyTorch]( and) 1.1+ and other dependencies (e.g., torchvision, visdom and dominate).
    • For pip users, please type the command pip install -r requirements.txt.
    • For Conda users, we provide a installation script ./scripts/ Alternatively, you can create a new Conda environment using conda env create -f environment.yml.
    • For Docker users, we provide the pre-built Docker image and Dockerfile. Please refer to our Docker page.

MONet train/test

  • Download a MONet dataset (e.g. CLEVR):
wget -cN
  • To view training results and loss plots, run python -m visdom.server and click the URL http://localhost:8097.
  • Train a model:
python --dataroot ./datasets/CLEVR_v1.0 --name clevr_monet --model monet

To see more intermediate results, check out ./checkpoints/clevr_monet/web/index.html.

To generate a montage of the model outputs like the ones shown above:


Apply a pre-trained model

  • Download pretrained weights for CLEVR 64x64:
./scripts/ clevr