A two-stage generative adversarial network that generates images of guitarists playing guitar from audio.
Stage 1: Audio to binary mask
Stage 2: Binary mask to color image
More information in this blog post.
1. Video output
- Case 1: Test on the same guitarist, 南澤大介.
- Case 2: Test on different guitarist, 伍々慧. The playing style and recording are different from the training data.
- Case 3: Test on different instruments and human voice.
2. Conditional output
The following gifs are result images generated from an audio that the model had never seen.
- Source video (audio): tupliのテーマ (acoustic guitar solo) 作曲／編曲：南澤大介
- Top to bottom: audio visualization, stage-1 output, stage-2 output, ground truth.
3. Pose-guided generation
The following gifs show outputs of 2nd-stage model given conditional poses.
- Source video (audio): John Pizzarelli - "I Got Rhythm" (solo) at the Fretboard Journal
- Top: Reference video; Middle: conditional hands input; Bottom: stage 2 output.