checkmate
See the paper! https://arxiv.org/abs/1910.02653
checkmate breaks the GPU memory wall by enabling researchers to train large stateoftheart models that do not fit in GPU memory. Checkmate applies optimal tensor rematerialization (as detailed in our paper at MLSys 2020) to trade off space and time.
At the moment, Checkmate only supports TensorFlow 2.0. PyTorch support is coming soon!
Installation
Checkmate depends on:

TensorFlow 2.0, i.e.
pip install tensorflow
orpip install tensorflowgpu
. 
Installing CyLP on Debian Linux / Ubuntu
$ sudo apt install coinorcbc coinorlibcbcdev $ pip install cylp
Installing CyLP on MacOS
The easiest way to set up CyLP is using homebrew.
$ brew tap coinortools/coinor $ brew install coinortools/coinor/cbc pkgconfig $ pip install cylp
Once TensorFlow 2.0 and CyLP are installed, Checkmate can be installed using pip via pip install "https://github.com/parasj/checkmate/archive/master.zip#egg=checkmate"
.
Quick start
Get started in 5m with our TF2.0 quickstart tutorial
Adapt your Keras model to fit within the memory constraints of a single GPU:
import checkmate
model = tf.keras.applications.vgg19.VGG19(...)
...
train_iteration_fn = checkmate.tf2.compile(model, loss, optimizer,
input_spec=sample_input[0], label_spec=sample_input[1])
for image, label in train_ds:
prediction, loss = train_iteration_fn(image, label)
Key ideas
From our paper at MLSys 2020:
Modern neural networks are increasingly bottlenecked by the limited capacity of ondevice
GPU memory. Prior work explores dropping activations as a strategy to scale to larger
neural networks under memory constraints. However, these heuristics assume uniform
perlayer costs and are limited to simple architectures with linear graphs, limiting their
usability. In this paper, we formalize the problem of tradingoff DNN training time and
memory requirements as the tensor rematerialization optimization problem, a generalization
of prior checkpointing strategies. We introduce Checkmate, a system that solves for
optimal schedules in reasonable times (under an hour) using offtheshelf MILP solvers,
then uses these schedules to accelerate millions of training iterations. Our method scales
to complex, realistic architectures and is hardwareaware through the use of
acceleratorspecific, profilebased cost models. In addition to reducing training cost,
Checkmate enables realworld networks to be trained with up to 5.1× larger input sizes.
Citation
If you use Checkmate in your work, please cite us with:
@article{jain2019checkmate,
title={Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization},
author={Jain, Paras and Jain, Ajay and Nrusimha, Aniruddha and Gholami, Amir and
Abbeel, Pieter and Keutzer, Kurt and Stoica, Ion and Gonzalez, Joseph E},
journal={arXiv preprint arXiv:1910.02653},
year={2020}
}