BiLSTM-CNN-CRF tagger

BiLSTM-CNN-CRF tagger is a PyTorch implementation of "mainstream" neural tagging scheme based on works of Lample, et. al., 2016 and Ma et. al., 2016.

Requirements

  • python 3.6
  • pytorch 0.4.1
  • numpy 1.15.1
  • scipy 1.1.0
  • scikit-learn 0.19.2

Benefits

  • native PyTorch implementation;
  • vectorized code for training on batches;
  • trustworthy evaluation of f1-score.

Project structure

|__ articles/ --> collection of papers related to the tagging, argument mining, etc. 
|__ classes/
        |__ data_io.py --> class for reading/writing data in different CoNNL file formats
        |__ datasets_bank.py --> class for storing the train/dev/test data subsets and sampling batches 
                                 from the train dataset 
        dataset
        |__ evaluator.py --> class for evaluation of F1 scores and token-level accuracies
        |__ report.py --> class for storing the evaluation results as text files
        |__ tag_components.py --> class for extracting tag components from BOI encodings 
        |__ utils.py --> several auxiliary utils and functions
|__ data/
        |__ NER/ --> Datasets for Named Entity Recognition 
            |__ CoNNL_2003_shared_task/ --> data for NER CoNLL-2003 shared task (English) in BOI-2 
                                            CoNNL format, from E.F. Tjong Kim Sang and F. De Meulder, 
                                            Introduction to the CoNLL-2003 Shared Task:  
                                            Language-Independent Named Entity Recognition, 2003. 
        |__ AM/ --> Datasets for Argument Mining
            |__ persuasive_essays/ --> data for persuasive essays in BOI-2-like CoNNL format, from: 
                                       Steffen Eger, Johannes Daxenberger, Iryna Gurevych. Neural 
                                       End-to-End  Learning for Computational Argumentation Mining, 2017
|__ embeddings/
        |__ get_glove_embeddings.sh --> script for downloading GloVe6B 100-dimensional word embeddings
|__ layers/
        |__ layer_base.py --> abstract base class for all types of layers
        |__ layer_birnn_base.py --> abstract base class for all bidirectional recurrent layers
        |__ layer_word_embeddings.py --> class implements word embeddings
        |__ layer_char_embeddings.py --> class implements character-level embeddings
        |__ layer_char_cnn.py --> class implements character-level convolutional 1D operation
        |__ layer_bilstm.py --> class implements bidirectional LSTM recurrent layer
        |__ layer_bigru.py --> class implements bidirectional GRU recurrent layer
        |__ layer_crf.py --> class implements conditional random field (CRF) 
|__ models/
        |__ tagger_base.py --> abstract base class for all types of taggers
        |__ tagger_io.py --> contains wrappers to create and load tagger models
        |__ tagger_birnn.py --> vanilla BiLSTM/BiGRU tagger model
        |__ tagger_birnn_crf.py --> BiLSTM/BiGRU + CRF tagger model
        |__ tagger_birnn_cnn.py --> BiLSTM/BiGRU + char-level CNN tagger model
        |__ tagger_birnn_cnn_crf.py --> BiLSTM/BiGRU + char-level CNN  + CRF tagger model
|__ pretrained/
        |__ tagger_NER.hdf5 --> tagger for NER, BiGRU+CNN+CRF trained on NER-2003 shared task, English 
|__ seq_indexers/
        |__ seq_indexer_base.py --> abstract class for sequence indexers, they converts list of lists 
                                    of string items
    to the list of lists of integer indices and back
        |__ seq_indexer_base_embeddings.py --> abstract sequence indexer class that implements work 
                                               with embeddings 
        |__ seq_indexer_word.py --> converts list of lists of words as strings to list of lists of 
                                    integer indices and back, has built-in embeddings
        |__ seq_indexer_char.py --> converts list of lists of characters to list of lists of integer 
                                    indices and back, has built-in embeddings 
        |__ seq_indexer_tag.py --> converts list of lists of string tags to list of lists of integer 
                                    indices and back, doesn't have built-in embeddings 
|__ main.py --> main script for training/evaluation/saving tagger models
|__ run_tagger.py --> run the trained tagger model from the checkpoint file
|__ conlleval --> "official" Perl script from NER 2003 shared task for evaluating the f1 scores, 
                   author: Erik Tjong Kim Sang, version: 2004-01-26
|__ requirements.txt --> file for managing packages requirements    

Evaluation

Results of training the models with the default settings:

tagger model dataset micro-f1 on test
BiLSTM + CNN + CRF Lample et. al., 2016 NER-2003 shared task (English) 90.94
BiLSTM + CNN + CRF Ma et al., 2016 NER-2003 shared task (English) 91.21
BiLSTM + CNN + CRF (our) NER-2003 shared task (English) 90.86
STag_BLCC, Eger et. al., 2017 AM Persuasive Essays, Paragraph Level 66.69
LSTM-ER, Eger et. al., 2017 AM Persuasive Essays, Paragraph Level 70.83
BiGRU + CNN + CRF (our) AM Persuasive Essays, Paragraph Level 64.31

In order to ensure the consistency of the experiments, for evaluation purposes we use "official" Perl script from NER 2003 shared task, author: Erik Tjong Kim Sang, version: 2004-01-26, example of it's output:

processed 46435 tokens with 5648 phrases; found: 5679 phrases; correct: 5146.
accuracy:  97.92%; precision:  90.61%; recall:  91.11%; FB1:  90.86
              LOC: precision:  91.35%; recall:  93.65%; FB1:  92.48  1710
             MISC: precision:  78.20%; recall:  82.76%; FB1:  80.42  743
              ORG: precision:  90.25%; recall:  88.02%; FB1:  89.12  1620
              PER: precision:  95.95%; recall:  95.30%; FB1:  95.63  1606

Usage

Train/test

To train/evaluate/save trained tagger model, please run the main.py script.

usage: main.py [-h] [--seed_num SEED_NUM] [--model MODEL]
               [--fn_train FN_TRAIN] [--fn_dev FN_DEV] [--fn_test FN_TEST]
               [--load LOAD] [--save SAVE] [--wsi WSI] [--emb_fn EMB_FN]
               [--emb_dim EMB_DIM] [--emb_delimiter EMB_DELIMITER]
               [--freeze_word_embeddings FREEZE_WORD_EMBEDDINGS]
               [--freeze_char_embeddings FREEZE_CHAR_EMBEDDINGS] [--gpu GPU]
               [--check_for_lowercase CHECK_FOR_LOWERCASE]
               [--epoch_num EPOCH_NUM] [--min_epoch_num MIN_EPOCH_NUM]
               [--patience PATIENCE] [--rnn_type RNN_TYPE]
               [--rnn_hidden_dim RNN_HIDDEN_DIM]
               [--char_embeddings_dim CHAR_EMBEDDINGS_DIM]
               [--word_len WORD_LEN]
               [--char_cnn_filter_num CHAR_CNN_FILTER_NUM]
               [--char_window_size CHAR_WINDOW_SIZE]
               [--dropout_ratio DROPOUT_RATIO] [--dataset_sort DATASET_SORT]
               [--clip_grad CLIP_GRAD] [--opt_method OPT_METHOD]
               [--batch_size BATCH_SIZE] [--lr LR] [--lr_decay LR_DECAY]
               [--momentum MOMENTUM] [--verbose VERBOSE]
               [--match_alpha_ratio MATCH_ALPHA_RATIO] [--save_best SAVE_BEST]
               [--report_fn REPORT_FN]

Learning tagging problem using neural networks

optional arguments:
  -h, --help            show this help message and exit
  --seed_num SEED_NUM   Random seed number, you may use any but 42 is the
                        answer.
  --model MODEL         Tagger model: "BiRNN", "BiRNNCNN", "BiRNNCRF",
                        "BiRNNCNNCRF".
  --fn_train FN_TRAIN   Train data in CoNNL-2003 format.
  --fn_dev FN_DEV       Dev data in CoNNL-2003 format, it is used to find best
                        model during the training.
  --fn_test FN_TEST     Test data in CoNNL-2003 format, it is used to obtain
                        the final accuracy/F1 score.
  --load LOAD           Path to load from the trained model.
  --save SAVE           Path to save the trained model.
  --wsi WSI             Load word_seq_indexer object from hdf5 file.
  --emb_fn EMB_FN       Path to word embeddings file.
  --emb_dim EMB_DIM     Dimension of word embeddings file.
  --emb_delimiter EMB_DELIMITER
                        Delimiter for word embeddings file.
  --freeze_word_embeddings FREEZE_WORD_EMBEDDINGS
                        False to continue training the \ word embeddings.
  --freeze_char_embeddings FREEZE_CHAR_EMBEDDINGS
                        False to continue training the char embeddings.
  --gpu GPU             GPU device number, 0 by default, -1 means CPU.
  --check_for_lowercase CHECK_FOR_LOWERCASE
                        Read characters caseless.
  --epoch_num EPOCH_NUM
                        Number of epochs.
  --min_epoch_num MIN_EPOCH_NUM
                        Minimum number of epochs.
  --patience PATIENCE   Patience for early stopping.
  --rnn_type RNN_TYPE   RNN cell units type: "Vanilla", "LSTM", "GRU".
  --rnn_hidden_dim RNN_HIDDEN_DIM
                        Number hidden units in the recurrent layer.
  --char_embeddings_dim CHAR_EMBEDDINGS_DIM
                        Char embeddings dim, only for char CNNs.
  --word_len WORD_LEN   Max length of words in characters for char CNNs.
  --char_cnn_filter_num CHAR_CNN_FILTER_NUM
                        Number of filters in Char CNN.
  --char_window_size CHAR_WINDOW_SIZE
                        Convolution1D size.
  --dropout_ratio DROPOUT_RATIO
                        Dropout ratio.
  --dataset_sort DATASET_SORT
                        Sort sequences by length for training.
  --clip_grad CLIP_GRAD
                        Clipping gradients maximum L2 norm.
  --opt_method OPT_METHOD
                        Optimization method: "sgd", "adam".
  --batch_size BATCH_SIZE
                        Batch size, samples.
  --lr LR               Learning rate.
  --lr_decay LR_DECAY   Learning decay rate.
  --momentum MOMENTUM   Learning momentum rate.
  --verbose VERBOSE     Show additional information.
  --match_alpha_ratio MATCH_ALPHA_RATIO
                        Alpha ratio from non-strict matching, options: 0.999
                        or 0.5
  --save_best SAVE_BEST
                        Save best on dev model as a final model.
  --report_fn REPORT_FN
                        Report filename.

Run trained model

usage: run_tagger.py [-h] [--fn FN] [--checkpoint_fn CHECKPOINT_FN]
                             [--gpu GPU]

Run trained tagger from the checkpoint file

optional arguments:
  -h, --help            show this help message and exit
  --fn FN               Train data in CoNNL-2003 format.
  --checkpoint_fn CHECKPOINT_FN
                        Path to load the trained model.
  --gpu GPU             GPU device number, 0 by default, -1 means CPU.

Example of output report

Evaluation

batch_size=10
char_cnn_filter_num=30
char_embeddings_dim=25
char_window_size=3
check_for_lowercase=True
clip_grad=5
dataset_sort=True
dropout_ratio=0.5
emb_delimiter=' '
emb_dim=100
emb_fn='embeddings/glove.6B.100d.txt'
epoch_num=100
fn_dev='data/NER/CoNNL_2003_shared_task/dev.txt'
fn_test='data/NER/CoNNL_2003_shared_task/test.txt'
fn_train='data/NER/CoNNL_2003_shared_task/train.txt'
freeze_char_embeddings=False
freeze_word_embeddings=False
gpu=0
load=None
lr=0.01
lr_decay=0.05
match_alpha_ratio=0.999
min_epoch_num=50
model='BiRNNCNNCRF'
momentum=0.9
opt_method='sgd'
patience=20
report_fn='2018_10_09_07-55_14_report.txt'
rnn_hidden_dim=100
rnn_type='LSTM'
save='2018_10_09_07-55_14_tagger.hdf5'
save_best=False
seed_num=42
verbose=True
word_len=20
wsi=None

     epoch  | train loss |   f1-train |     f1-dev |    f1-test | acc. train |   acc. dev |  acc. test 
---------------------------------------------------------------------------------------------------------
          1 |     302.08 |      82.69 |      83.02 |      80.19 |      95.68 |      95.59 |      95.20 
          2 |     151.72 |      89.32 |      88.63 |      84.90 |      97.66 |      97.43 |      96.56 
          3 |     108.10 |      91.76 |      90.80 |      87.84 |      98.35 |      98.08 |      97.37 
          4 |      88.41 |      92.41 |      90.64 |      88.01 |      98.51 |      98.11 |      97.44 
          5 |      75.45 |      93.66 |      91.76 |      89.20 |      98.76 |      98.28 |      97.53 
          6 |      67.20 |      94.45 |      92.35 |      89.94 |      98.92 |      98.40 |      97.78 
          7 |      61.48 |      95.35 |      92.96 |      89.94 |      99.10 |      98.53 |      97.78 
          8 |      56.26 |      95.38 |      92.34 |      89.62 |      99.11 |      98.44 |      97.67 
          9 |      52.61 |      95.68 |      92.35 |      89.43 |      99.16 |      98.44 |      97.57 
         10 |      48.84 |      96.44 |      93.18 |      90.20 |      99.31 |      98.61 |      97.77 
         11 |      45.93 |      96.53 |      92.79 |      90.10 |      99.34 |      98.52 |      97.76 
         12 |      42.84 |      96.71 |      93.12 |      89.99 |      99.33 |      98.53 |      97.59 
         13 |      40.87 |      97.11 |      93.34 |      90.31 |      99.46 |      98.64 |      97.79 
         14 |      39.28 |      97.32 |      93.51 |      90.39 |      99.49 |      98.66 |      97.81 
         15 |      37.28 |      97.51 |      93.50 |      90.32 |      99.53 |      98.67 |      97.84 
         16 |      35.54 |      97.52 |      93.44 |      90.09 |      99.54 |      98.65 |      97.68 
         17 |      33.91 |      97.37 |      93.73 |      89.89 |      99.50 |      98.69 |      97.71 
         18 |      32.79 |      97.83 |      93.38 |      90.88 |      99.61 |      98.65 |      97.96 
         19 |      30.78 |      97.86 |      93.72 |      90.23 |      99.62 |      98.69 |      97.76 
         20 |      30.02 |      98.13 |      93.79 |      90.78 |      99.66 |      98.66 |      97.90 
         21 |      29.37 |      98.06 |      93.62 |      90.03 |      99.65 |      98.70 |      97.75 
         22 |      27.77 |      98.07 |      93.95 |      90.56 |      99.64 |      98.74 |      97.87 
         23 |      26.41 |      98.12 |      93.20 |      90.30 |      99.67 |      98.61 |      97.80 
         24 |      26.69 |      98.40 |      94.03 |      90.77 |      99.71 |      98.76 |      97.91 
         25 |      24.89 |      98.53 |      93.67 |      90.75 |      99.74 |      98.72 |      97.92 
         26 |      24.18 |      98.57 |      93.80 |      90.70 |      99.73 |      98.69 |      97.90 
         27 |      23.72 |      98.68 |      94.18 |      90.84 |      99.78 |      98.79 |      97.93 
         28 |      23.80 |      98.71 |      94.13 |      90.60 |      99.78 |      98.78 |      97.87 
         29 |      22.46 |      98.55 |      93.68 |      90.48 |      99.72 |      98.67 |      97.82 
         30 |      22.25 |      98.72 |      93.91 |      90.78 |      99.78 |      98.72 |      97.88 
         31 |      21.56 |      98.81 |      94.10 |      90.38 |      99.80 |      98.76 |      97.76 
         32 |      20.95 |      98.94 |      94.36 |      90.60 |      99.81 |      98.82 |      97.82 
         33 |      20.11 |      98.93 |      94.31 |      90.79 |      99.83 |      98.84 |      97.94 
         34 |      20.87 |      98.83 |      93.98 |      90.46 |      99.80 |      98.78 |      97.84 
         35 |      19.16 |      98.93 |      94.00 |      90.57 |      99.81 |      98.73 |      97.88 
         36 |      18.78 |      98.99 |      93.90 |      90.56 |      99.82 |      98.72 |      97.87 
         37 |      18.26 |      99.15 |      94.19 |      90.74 |      99.86 |      98.78 |      97.92 
         38 |      18.33 |      98.93 |      94.23 |      90.69 |      99.80 |      98.77 |      97.88 
         39 |      18.16 |      99.16 |      94.18 |      90.66 |      99.86 |      98.77 |      97.90 
         40 |      16.65 |      99.20 |      94.35 |      90.91 |      99.87 |      98.84 |      97.94 
         41 |      17.50 |      99.24 |      93.95 |      90.76 |      99.87 |      98.72 |      97.91 
         42 |      17.74 |      99.28 |      94.14 |      90.76 |      99.89 |      98.81 |      97.93 
         43 |      18.12 |      99.24 |      94.06 |      90.63 |      99.88 |      98.79 |      97.91 
         44 |      16.16 |      99.24 |      94.10 |      90.61 |      99.88 |      98.78 |      97.90 
         45 |      15.82 |      99.31 |      94.10 |      90.64 |      99.89 |      98.78 |      97.86 
         46 |      15.76 |      99.29 |      94.26 |      90.64 |      99.90 |      98.80 |      97.93 
         47 |      14.65 |      99.28 |      93.96 |      90.42 |      99.87 |      98.72 |      97.80 
         48 |      14.84 |      99.45 |      94.30 |      90.97 |      99.92 |      98.82 |      97.93 
         49 |      15.59 |      99.39 |      94.13 |      90.75 |      99.91 |      98.79 |      97.84 
         50 |      13.84 |      99.39 |      94.07 |      91.10 |      99.91 |      98.78 |      98.00 
         51 |      15.17 |      99.39 |      94.16 |      90.97 |      99.90 |      98.79 |      97.94 
         52 |      14.33 |      99.42 |      94.16 |      90.64 |      99.91 |      98.80 |      97.84 
         53 |      13.90 |      99.56 |      94.48 |      90.82 |      99.94 |      98.84 |      97.89 
         54 |      14.36 |      99.50 |      94.20 |      90.92 |      99.93 |      98.80 |      97.94 
         55 |      14.41 |      99.40 |      94.15 |      90.50 |      99.90 |      98.78 |      97.85 
         56 |      13.52 |      99.55 |      94.28 |      90.73 |      99.93 |      98.82 |      97.87 
         57 |      12.61 |      99.51 |      94.21 |      90.60 |      99.92 |      98.79 |      97.85 
         58 |      12.89 |      99.49 |      94.17 |      90.78 |      99.92 |      98.78 |      97.90 
         59 |      12.60 |      99.51 |      94.17 |      90.57 |      99.92 |      98.79 |      97.83 
         60 |      13.01 |      99.48 |      93.84 |      90.46 |      99.92 |      98.72 |      97.81 
         61 |      12.94 |      99.53 |      94.06 |      90.65 |      99.93 |      98.77 |      97.84 
         62 |      12.36 |      99.58 |      93.99 |      90.55 |      99.93 |      98.76 |      97.83 
         63 |      11.82 |      99.54 |      94.27 |      90.76 |      99.92 |      98.80 |      97.90 
         64 |      12.09 |      99.59 |      94.25 |      90.79 |      99.94 |      98.79 |      97.89 
         65 |      12.10 |      99.54 |      94.20 |      90.67 |      99.93 |      98.81 |      97.85 
         66 |      11.72 |      99.49 |      94.31 |      90.76 |      99.92 |      98.83 |      97.89 
         67 |      11.43 |      99.58 |      94.22 |      90.86 |      99.94 |      98.83 |      97.90 
         68 |      11.05 |      99.60 |      94.06 |      90.75 |      99.94 |      98.76 |      97.88 
         69 |      10.50 |      99.63 |      94.21 |      90.77 |      99.94 |      98.78 |      97.88 
         70 |      10.85 |      99.55 |      94.19 |      90.74 |      99.93 |      98.79 |      97.87 
         71 |      11.22 |      99.64 |      94.27 |      90.80 |      99.95 |      98.80 |      97.90 
         72 |      11.51 |      99.60 |      94.24 |      90.77 |      99.94 |      98.79 |      97.90 
         73 |      10.90 |      99.70 |      94.02 |      90.70 |      99.95 |      98.75 |      97.87 
         74 |       9.90 |      99.67 |      94.11 |      90.86 |      99.95 |      98.77 |      97.92 
---------------------------------------------------------------------------------------------------------
Final eval on test,  micro-f1 test = 74) = 90.86

GitHub