Hackable and optimized Transformers building blocks, supporting a composable construction

Description

xFormers is a modular and field agnostic library to flexibly generate transformer architectures by interoperable and optimized building blocks.

Getting started

The full documentation contains instructions for getting started, deep dives and tutorials about the various APIs.
If in doubt, please check out the HOWTO. Only some general considerations are laid out in the README.

Installation

To install xFormers, it is recommended to use a dedicated virtual environment, as often with python, through python-virtualenv or conda for instance.
There are two ways you can install it:

Directly from the pip package

You can also fetch the latest release from PyPi. This will not contain the wheels for the sparse attention kernels, for which you will need to build from source.

conda create --name xformer_env
conda activate xformer_env
pip install xformers

Build from source (dev mode)

These commands will fetch the latest version of the code, create a dedicated conda environment, activate it then install xFormers from source. If you want to build the sparse attention CUDA kernels, please make sure that the next point is covered prior to running these instructions.

git clone [email protected]:fairinternal/xformers.git
conda create --name xformer_env python=3.8
conda activate xformer_env
cd xformers
pip install -r requirements.txt
pip install -e .

Sparse attention kernels

Installing the CUDA-based sparse attention kernels may require extra care, as this mobilizes the CUDA toolchain. As a reminder, these kernels are built when you run pip install -e . and the CUDA buildchain is available (NVCC compiler). Re-building can for instance be done via python3 setup.py clean && python3 setup.py develop, so similarly wipe the build folder and redo a pip install -e.

Some advices related to building these CUDA-specific components, tentatively adressing common pitfalls. Please make sure that:

NVCC and the current CUDA runtime match. Depending on your setup, you may be able to change the CUDA runtime with module unload cuda module load cuda/xx.x, possibly also nvcc
the version of GCC that you’re using matches the current NVCC capabilities
the TORCH_CUDA_ARCH_LIST env variable is set to the architures that you want to support. A suggested setup (slow to build but comprehensive) is export TORCH_CUDA_ARCH_LIST="6.0;6.1;6.2;7.0;7.2;8.0;8.6"

Triton

Some parts of xFormers use Triton, and will only expose themselves if Triton is installed, and a compatible GPU is present (nVidia GPU with tensor cores). If Triton was not installed as part of the testing procedure, you can install it directly by running pip install triton. You can optionally test that the installation is successful by running one of the Triton-related benchmarks, for instance python3 xformers/benchmarks/benchmnark_triton_softmax.py

Triton will cache the compiled kernels to /tmp/triton by default. If this becomes an issue, this path can be specified through the TRITON_CACHE_DIR environment variable.

Testing the installation

This will run a benchmark of the attention mechanisms exposed by xFormers, and generate a runtime and memory plot.
If this concludes without errors, the installation is successful. This step is optional, and you will need some extra dependencies for it to
be able to go through : pip install -r requirements-benchmark.txt.

Once this is done, you can run this particular benchmark as follows:

python3 xformers/benchmarks/benchmark_encoder.py --activations relu  --plot -emb 256 -bs 32 -heads 16

Using xFormers

Transformers key concepts

Let’s start from a classical overview of the Transformer architecture (illustration from Lin et al,, “A Survey of Transformers”)

You’ll find the key repository boundaries in this illustration: a Transformer is generally made of a collection of attention mechanisms, embeddings to encode some positional information, feed-forward blocks and a residual path (typically referred to as pre- or post- layer norm). These boundaries do not work for all models, but we found in practice that given some accomodations it could capture most of the state of the art.

Models are thus not implemented in monolithic files, which are typically complicated to handle and modify. Most of the concepts present in the above illustration correspond to an abstraction level, and when variants are present for a given sub-block it should always be possible to select any of them. You can focus on a given encapsulation level and modify it as needed.

Repo map

├── components                  # Parts zoo, any of which can be used directly
│   └── attention
│        └ ...                  # all the supported attentions
│   └── feedforward             #
│        └ ...                  # all the supported feedforwards
│   └─- positional_embedding    #
│        └ ...                  # all the supported positional embeddings
│   ├── activations.py          #
│   └── multi_head_dispatch.py  # (optional) multihead wrap
d├── factory
│   ├── block_factory.py        # (optional) helper to programatically generate layers
│   └── model_factory.py        # (optional) helper to programatically generate models
├── models
...                             # Full models, ready to be used

Attention mechanisms

Scaled dot product
- Attention is all you need, Vaswani et al., 2017
Sparse
- whenever a sparse enough mask is passed
BlockSparse
- courtesy of Triton
Linformer
- Linformer, self-attention with linear complexity, Wang et al., 2020
Nystrom
- Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention, Xiong et al., 2021
Local.
Notably used in (and many others)
- Longformer: The Long-Document Transformer, Beltagy et al., 2020
- BigBird, Transformer for longer sequences, Zaheer et al., 2020
Favor/Performer
- Rethinking Attention with Performers, Choromanski et al., 2020
Orthoformer
- Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers,
  Patrick et al., 2021
Random
- See BigBird, Longformers,..
Global
- See BigBird, Longformers,..
FourierMix
- FNet: Mixing Tokens with Fourier Transforms, Lee-Thorp et al.
… add a new one see Contribution.md

Feed forward mechanisms

MLP
Fused

Positional embedding

Key Features

Many attention mechanisms, interchangeables
Optimized building blocks, beyond PyTorch primitives
1. sparse attention
2. block-sparse attention
3. fused softmax
4. fused linear layer
5. fused layer norm
Benchmarking and testing tools
1. micro benchnmarks
2. transformer block benchmark
3. LRA, with SLURM suppot
Programatic and sweep friendly layer and model construction
Hackable
1. Not using monolithic CUDA kernels, composable building blocks
2. Using Triton for some optimized parts, explicit, pythonic and user-accessible

FAQ ?

We’ve tried to collect a relatively exhaustive list of explanations in the HOWTO

License

xFormers has a BSD-style license, as found in the LICENSE file.

Citing xFormers

If you use xFormers in your publication, please cite it by using the following BibTeX entry.

@Misc{xFormers2021,
  author =       {Benjamin Lefaudeux, Francisco Massa, Diana Liskovich, Min Xu, Jieru Hu, Marta Tintore, Susan Zhang },
  title =        {xFormers: A modular and hackable Transformer modelling library},
  howpublished = {\url{https://github.com/facebookresearch/xformers}},
  year =         {2021}
}

GitHub

View Github

Hackable and optimized Transformers building blocks, supporting a composable construction

Description

Getting started

Installation

Directly from the pip package

Build from source (dev mode)

Sparse attention kernels

Triton

Testing the installation

Using xFormers

Transformers key concepts

Repo map

Key Features

FAQ ?

License

Citing xFormers

GitHub

John

Auto forward messages from chats to others with your Telegram user account

MVGCN: a novel multi-view graph convolutional network (MVGCN) framework for link prediction in biomedical bipartite networks

Description

Getting started

Installation

Directly from the pip package

Build from source (dev mode)

Sparse attention kernels

Triton

Testing the installation

Using xFormers

Transformers key concepts

Repo map

Key Features

FAQ ?

License

Citing xFormers

GitHub

Auto forward messages from chats to others with your Telegram user account

MVGCN: a novel multi-view graph convolutional network (MVGCN) framework for link prediction in biomedical bipartite networks

You might also like...