Lightweight PyTorch implementation of a seq2seq text summarizer.
- Simple code structure, easy to understand.
- Minimal dependencies (Python 3.6,
- Batch training/testing on GPU/CPU.
- Teacher forcing ratio.
- Initialization with pre-trained word embeddings.
- Embedding sharing across encoder, decoder input, and decoder output.
- Attention mechanism (bilinear, aka Luong's "general" type).
- Pointer network, which copies words (can be out-of-vocabulary) from the source.
- Visualization of attention and pointer weights:
- Validation using ROUGE: Please put ROUGE-1.5.5.pl and its "data" folder under data/;
pyrougeis NOT required.
- Reinforcement learning using self-critical policy gradient training: See A Deep Reinforced Model for Abstractive Summarization by Paulus, Xiong and Socher for the mixed objective function.
- Two types of repetition avoidance:
- Intra-decoder attention as used in the above-mentioned paper, to let the decoder access its history (attending over all past decoder states).
- Coverage mechanism, which discourages repeatedly attending to the same area of the input sequence: See Get To The Point: Summarization with Pointer-Generator Networks by See and Manning for the coverage loss (note that the attention here incorporates the coverage vector in a different way).
To be implemented
- Run on longer texts (missing modern hardware $_$).
- Beam search at test time.