# A CUDA implementation of fused LSTM and GRU layers

## Haste

Haste is a CUDA implementation of fused LSTM and GRU layers with built-in DropConnect and Zoneout regularization. These layers are exposed through C++ and Python APIs for easy integration into your own projects or machine learning frameworks.

What's included in this project?

- a standalone C++ API (
`libhaste`

) - a TensorFlow Python API (
`haste_tf`

) - examples for writing your own custom C++ inference / training code using
`libhaste`

For questions or feedback about Haste, please open an issue on GitHub or send us an email at [email protected].

## Install

Here's what you'll need to get started:

- a CUDA Compute Capability 6.0+ GPU
- TensorFlow GPU 1.14+ or 2.0+ for TensorFlow integration
- Eigen 3 to build the C++ examples

Once you have the prerequisites, run the following to build the code and install the TensorFlow API:

```
make && pip install haste_tf-*.whl
```

## Documentation

Getting started with the TensorFlow API is easy:

```
import haste_tf as haste
lstm_layer = haste.LSTM(num_units=256, direction='bidirectional', zoneout=0.1, dropout=0.05)
gru_layer = haste.GRU(num_units=256, direction='bidirectional', zoneout=0.1, dropout=0.05)
# `x` is a tensor with shape [N,T,C]
y, state = lstm_layer(x)
y, state = gru_layer(x)
```

The TensorFlow Python API is documented in `docs/tf/haste_tf.md`

.

The C++ API is documented in `lib/haste.h`

and there are code samples in `examples/`

.

## Code layout

`docs/tf/`

: API reference documentation for`haste_tf`

`examples/`

: examples for writing your own C++ inference / training code using`libhaste`

`frameworks/tf/`

: TensorFlow Python API and custom op code`lib/`

: CUDA kernels and C++ API

## Implementation notes

- the GRU implementation is based on
`1406.1078v1`

(same as cuDNN) rather than`1406.1078v3`

- Zoneout on LSTM cells is applied to the hidden state only, and not the cell state

## References

- Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory.
*Neural Computation*,*9*(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735 - Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation.
*arXiv:1406.1078 [cs, stat]*. http://arxiv.org/abs/1406.1078. - Wan, L., Zeiler, M., Zhang, S., Cun, Y. L., & Fergus, R. (2013). Regularization of Neural Networks using DropConnect. In
*International Conference on Machine Learning*(pp. 1058–1066). Presented at the International Conference on Machine Learning. http://proceedings.mlr.press/v28/wan13.html. - Krueger, D., Maharaj, T., Kramár, J., Pezeshki, M., Ballas, N., Ke, N. R., et al. (2017). Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations.
*arXiv:1606.01305 [cs]*. http://arxiv.org/abs/1606.01305.

## Citing this work

To cite this work, please use the following BibTeX entry:

```
@misc{haste2020,
title = {Haste: a fast, simple, and open RNN library},
author = {Sharvil Nanavati},
year = 2020,
month = "Jan",
howpublished = {\url{https://github.com/lmnt-com/haste/}},
}
```