Deep Learning Galerkin Transformer: a linear attention without softmax Galerkin Transformer: a linear attention without softmax 03 October 2021
Transformer Implementation of a Transformer, but completely in Triton Implementation of a Transformer, but completely in Triton 30 September 2021