Face2webtoon

Image to image (I2I) translation is an extremely challenging task, especially when data is insufficient. This work aims at transferring human faces to the webtoon domain with limited data.

Webtoon Dataset

data

I collected dataset from naver webtoon 연애혁명 using anime face detector. Since face detector is not that good at detecting the faces from webtoon, I could gather only 1400 webtoon face images.

Baseline 0(U-GAT-IT)

I used U-GAT-IT official pytorch implementation.
U-GAT-IT is GAN for an unpaired I2I translation. Using CAM attention module and adaptive layer instance normalization, it performed well when considerable shape deformation is required.

For face data, i used AFAD-Lite dataset from https://github.com/afad-dataset/tarball-lite.

good

gif1

Some results look pretty nice, but some attributes were lost during the domain transfer.

Missing of Attributes

Gender

gender

Gender information was lost.

Glasses

glasses

A model failed to generate glasses in the webtoon faces.

Result Analysis

To analysis the result, I seperated webtoon dataset to 5 different groups.

group number group name number of data
0 woman_no_glasses 1050
1 man_no_glasses 249
2 man_glasses 17->49
3 woman_glasses 15->38

Even after I collected more data for groups 2 and 3, there are severe imbalances between groups. As a result, the model failed to translate to few shot groups, for example, groups 2 and 3.

U-GAT-IT + Few Shot Transfer

Few shot transfer : https://arxiv.org/abs/2007.13332

Paper review : https://yun905.tistory.com/48

In this paper, the authors successfully transferred the knowledge from the group with enough data to few shot groups.

Basic model

The basic model was trained for group 0 only.

basic_model1
basic_model2

Baseline 1 (simple fine-tuning)

For baseline 1, I froze the bottleneck layers of the generator and tried to fine-tune the basic model. I used 38 images(both real/fake) of groups 1,2 and 3. Additionally, I also used 8 images of group 0 to prevent forgetting. The model was trained for 200k iterations.

1

Model randomly mapped between groups.

Baseline 2 (group classification loss + selective backprop)

0

I attached additional group classifier to discriminator and added group classification loss according to original paper. Images of groups 0,1,2,3 were fed sequentially, and bottleneck layers of the generator were updated for group 0 only.

Since FID has a significant amount of bias when training with a limited data, I used KID

KID*1000
25.95

U-GAT-IT + group classification loss + adaptive discriminator augmentation

ADA is a useful data augmentation method for training GAN with limited data. Although the authors only considered unconditional GANs, I applied ADA to U-GAT-IT. Augmentation was applied to both discriminators. I assumed that preventing the discriminator of the face domain from overfitting would improve the performance of the face generator. Therefore the cycle consistency loss would be more meaningful. Only pixel blitting and geometric transformation was implemented. The effects of other augmentation methods are minor, according to the paper.

To achieve better result, I changed face dataset to more diverse one(CelebA).

merge_from_ofoct (2)

merge_from_ofoct (1)

image

ADA makes training longer. I trained for 8 days with a single 2070 SUPER but did not fully converge.

KID*1000
12.14

Checkpoint file

https://drive.google.com/file/d/1uS5-3iIbYeh_aau0ydNkTO5aRhJWw0jX/view?usp=sharing

Start training

python main.py --dataset dataset_name --useADA True --group 0,1,2,3 --use_grouploss True --neptune False

If --neptune is True, the experiment is transmitted to neptune ai, which is experiment management tool. You must set your API token. --group 0,1,3 make group 2 out of training.

GitHub

https://github.com/sangyun884/Face2Webtoon