stable-diffusion-pytorch

Yet another PyTorch implementation of Stable Diffusion.

I tried my best to make the codebase minimal, self-contained, consistent, hackable, and easy to read. Features are pruned if not needed in Stable Diffusion (e.g. Attention mask at CLIP tokenizer/encoder). Configs are hard-coded (based on Stable Diffusion v1.x). Loops are unrolled when that shape makes more sense.

Despite of my efforts, I feel like I cooked another sphagetti. Well, help yourself!

Heavily referred to following repositories. Big kudos to them!

Dependencies

PyTorch
Numpy
Pillow
regex

How to Install

Clone or download this repository.
Install dependencies: Run pip install torch numpy Pillow regex or pip install -r requirements.txt.
Download data.tar from here and unpack in the parent folder of stable_diffusion_pytorch. Your folders should be like this:

stable-diffusion-pytorch(-main)/
├─ data/
│  ├─ ckpt/
│  ├─ ...
├─ stable_diffusion_pytorch/
│  ├─ samplers/
└  ┴─ ...

Note that checkpoint files included in data.zip have different license — you should agree to the license to use checkpoint files.

How to Use

Import stable_diffusion_pytorch as submodule.

Here’s some example scripts. You can also read the docstring of stable_diffusion_pytorch.pipeline.generate.

Text-to-image generation:

from stable_diffusion_pytorch import pipeline

prompts = ["a photograph of an astronaut riding a horse"]
images = pipeline.generate(prompts)
images[0].save('output.jpg')

…with multiple prompts:

prompts = [
    "a photograph of an astronaut riding a horse",
    ""]
images = pipeline.generate(prompts)

…with unconditional(negative) prompts:

prompts = ["a photograph of an astronaut riding a horse"]
uncond_prompts = ["low quality"]
images = pipeline.generate(prompts, uncond_prompts)

…with seed:

prompts = ["a photograph of an astronaut riding a horse"]
images = pipeline.generate(prompts, uncond_prompts, seed=42)

Preload models (you will need enough VRAM):

from stable_diffusion_pytorch import model_loader
models = model_loader.preload_models('cuda')

prompts = ["a photograph of an astronaut riding a horse"]
images = pipeline.generate(prompts, models=models)

If you get OOM with above code but have enough RAM (not VRAM), you can move models to GPU when needed and move back to CPU when not needed:

from stable_diffusion_pytorch import model_loader
models = model_loader.preload_models('cpu')

prompts = ["a photograph of an astronaut riding a horse"]
images = pipeline.generate(prompts, models=models, device='cuda', idle_device='cpu')

Image-to-image generation:

from PIL import Image

prompts = ["a photograph of an astronaut riding a horse"]
input_images = [Image.open('space.jpg')]
images = pipeline.generate(prompts, input_images=images)

…with custom strength:

prompts = ["a photograph of an astronaut riding a horse"]
input_images = [Image.open('space.jpg')]
images = pipeline.generate(prompts, input_images=images, strength=0.6)

Change classifier-free guidance scale:

prompts = ["a photograph of an astronaut riding a horse"]
images = pipeline.generate(prompts, cfg_scale=11)

…or disable classifier-free guidance:

prompts = ["a photograph of an astronaut riding a horse"]
images = pipeline.generate(prompts, do_cfg=False)

Reduce steps (faster generation, lower quality):

prompts = ["a photograph of an astronaut riding a horse"]
images = pipeline.generate(prompts, n_inference_steps=28)

Use different sampler:

prompts = ["a photograph of an astronaut riding a horse"]
images = pipeline.generate(prompts, sampler="k_euler")
# "k_lms" (default), "k_euler", or "k_euler_ancestral" is available

Generate image with custom size:

prompts = ["a photograph of an astronaut riding a horse"]
images = pipeline.generate(prompts, height=512, width=768)

LICENSE

All codes on this repository are licensed with MIT License. Please see LICENSE file.

Note that checkpoint files of Stable Diffusion are licensed with CreativeML Open RAIL-M License. It has use-based restriction caluse, so you’d better read it.