Spatially-Adapive Multilayer (SAM) Inversion

Project Page | Paper

Choosing a single latent layer for GAN inversion leads to a dilemma between obtaining a faithful reconstruction of the input image and being able to perform downstream edits (1st and 2nd row). In contrast, our proposed method automatically selects the latent space tailored for each region to balance the reconstruction quality and editability (3rd row).

Spatially-Adaptive Multilayer Selection for GAN Inversion and Editing Gaurav Parmar, Yijun Li, Jingwan Lu, Richard Zhang, Jun-Yan Zhu, Krishna Kumar Singh CMU, Adobe Research CVPR 2022

Image Formation with Multiple Latent Codes

We use the predicted invertibility map in conjunction with multiple latent codes to generate the final image. First, the StyleBlocks of the pretrained StyleGAN2 model are modulated by W+ directly. Subsequently, for intermediate feature space Fi, we predict the change in layer’s feature value ∆Fi and add it to the feature block after masking with the corresponding binary mask mi.

Predicting the Invertibility Map

We begin with predicting how difficult each region of the image is to invert for every latent layer using our trained invertibility network S. Subsequently we refine the predicted map using a semantic segmentation network and combine them using the user-specified threshold τ . This combined invertibility map shown on the right and used to determine the latent layer to be used for inverting each segment in the image.

Qualitative Results

Below we show image inversion and editing results obtained using the proposed method. Please see the project website for more image inversion and editing results compiled.

Quick Start

Environment Setup

See environment.yml for a full list of library dependencies. The following commands can be used to install all the dependicies in a new conda environment.

conda env create -f environment.yml
conda activate inversion

Inversion

An example command for inverting an image for a given target image is shown below. The --image_category should be one of {“cars”, “faces”, “cats”}. The --sweep_threshold will perform inversion for a range of different threshold values. See file for other optional flags.

python src/sam_inv_optimization.py \
    --image_category "cars" --image_path test_images/cars/b.png \
    --output_path "output/cars/" --sweep_thresholds --generate_edits

Using a Custom Dataset

In order to perform SAM Inversion on a custom dataset, a corresponding invertibility network needs to be trained. First, perform a single layer inversion using all candidate latent spaces as shown in the command below for all images in the training set.

for latent_name in "W+" "F4" "F6" "F8" "F10"; do
    python src/single_latent_inv.py \
        --image_category "cats" --image_folder_path datasets/custom_images/train \
        --num_opt_steps 501 --output_path "output/custom_ds/train/${latent_name}" --target_H 256 --target_W 256 \
        --latent_name ${latent_name}
done

Next, repeat the above for the validation and test splits. Finally, train the invertibility network as shown in the example command below.

python src/train_invertibility.py \
    --dataset_folder_train output/custom_ds/train \
    --dataset_folder_val output/custom_ds/val \
    --output_folder output/invertibility/custom_ds \
    --gpu-ids "0" --batch-size 16 --lr 0.0001

Reference

If you find this proejct useful for your research, please consider citing our paper.

@inproceedings{
parmar2022sam,
title={Spatially-Adaptive Multilayer Selection for GAN Inversion and Editing},
author={Gaurav Parmar, Yijun Li, Jingwan Lu, Richard Zhang, Jun-Yan Zhu, Krishna Kumar Singh},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2022}
}

Submodules Used

Please also take a look at the following relevant repositories.

  • e4e – Encoder used for the W+ inversions.
  • StyleGAN – The generative model used for the inversion.
  • Deeplab3-xception – Used for the base architectore of the invertibility prediction network.
  • HRNet, Detectron – Used for segmenting images (except faces).
  • Face Parsing – Used for segmenting face images.

GitHub

View Github