GradientAccumulator
This repo contains a TensorFlow 2 compatible implementation of accumulated gradients.
Simply wrap the accumulator over any optimizer, and specify accum_steps
to control number of accumulations.
Precompiled wheel compatible with Python 3.7-3.9 and TensorFlow 2.7-2.9 exist in Release, but you can build from source if you want to test if it works in your setup (see here).
For TF 1, I suggest using the AccumOptimizer implementation in the H2G-Net repository instead.
Install
From latest release:
pip install https://github.com/andreped/GradientAccumulator/releases/download/v0.1.0/GradientAccumulator-0.1.0-py3-none-any.whl
Or from source code:
pip install git+https://github.com/andreped/GradientAccumulator
Usage
from GradientAccumulator.accumulator import GradientAccumulator
from tensorflow.keras.optimizers import Adam
opt = Adam(1e-3)
wrapped_opt = GradientAccumulator(opt, accum_steps=4)
Then pass wrapped_opt to model.compile()
as optimizer, like so:
model.compile(optimizer=wrapped_opt, ...)
The implementation is derived and adjusted from the discussion at this TensorFlow Issue.
TODOs:
- Add generic wrapper class for adding accumulated gradients to any optimizer
- Add CI to build wheel and test that it works across different python versions, TF versions, and operating systems.
- Add wrapper class for BatchNormalization layer, similar as done for optimizers
- Test method for memory leaks
- Verify that implementation works in multi-GPU setups
- Add benchmarks to verfiy that accumulated gradients actually work as intended
- Add proper multi-GPU support
Disclaimer
Note that this implementation is only compatible with newer versions of TensorFlow. This is because the way Optimizers behave in TF has changed in TF 2. Slight modifications can likely be made to make this work for older versions, but I would recommend using newer versions of TF 2 instead, as it has become more stable and feature rich than recent versions.
Also note that this implementation does not work with TF 1. For the same reason as it does not work with older TF 2 versions. However, a TF 1 implementation can be found in the H2G-Net repository.
Tips
Remember to pass the wrapper to the custom_objects
in load_model
if you wish to load a trained model. This is only
necessary if you are setting compile=True
in load_model
, which is relevant for finetuning or to use model.evaluate()
.
from tensorflow.keras.models import load_model
model = load_model("/path/to/model", compile=True, custom_objects={"GradientAccumulator": GradientAccumulator})
Acknowledgements
This implementation is derived from the work of @fsx950223, @stefan-falk, and others, which is a closed PR tensorflow/addons#2525 to TF-addons. Hence, all credit to them and the people who contributed to the work! Sadly, the proposed implementation was not merged, as there were some unresolved issues with it, especially regarding multi-GPU training. However, I believe the current implementation is working well for single-GPU scenarios, which should already be of interest to the community.