FAMIE: A Fast Active Learning Framework for Multilingual Information Extraction
FAMIE is a comprehensive and efficient active learning (AL) toolkit for multilingual information extraction (IE). FAMIE is designed to address a fundamental problem in existing AL frameworks where annotators need to wait for a long time between annotation batches due to the time-consuming nature of model training and data selection at each AL iteration. With a novel proxy AL mechanism and the integration of our SOTA multilingual toolkit Trankit, FAMIE can quickly provide users with a labeled dataset and a ready-to-use model for different IE tasks over 100 languages.
FAMIE’s documentation page: https://famie.readthedocs.io
FAMIE’s demo website: http://nlp.uoregon.edu:9000/
FAMIE can be easily installed via one of the following methods:
pip install famie
The command would install FAMIE and all dependent packages automatically.
git clone https://github.com/nlp-uoregon/famie.git cd famie pip install -e .
This would first clone our github repo and install FAMIE.
FAMIE currently supports Named Entity Recognition and Event Detection for over 100 languages. Using FAMIE includes three following steps:
- Start an annotation session.
- Annotate data for a target task.
- Access the labeled data and a ready-to-use model returned by FAMIE.
Starting an annotation session
To start an annotation session, please use the following command:
This will run a server on users’ local machines (no data or models will leave users’ local machines), users can access FAMIE’s web interface via the URL: http://127.0.0.1:9000/
. As FAMIE is an AL framework, it provides different data selection algorithms that recommend users the most beneficial examples to label at each annotation iteration. This is done via passing an optional argument
Accessing the labeled data and the trained model
import famie # access a project via its name p = famie.get_project('named-entity-recognition') # access the project's labeled data data = p.get_labeled_data() # a Python dictionary # export the project's labeled data to a file p.export_labeled_data('data.json') # export the project's trained model to a file p.export_trained_model('model.ckpt') # access the project's trained model model = p.get_trained_model() # access a trained model from file model = famie.load_model_from_file('model.ckpt') # use the trained model to make predicions model.predict('Oregon is a beautiful state!') # ['B-Location', 'O', 'O', 'O', 'O']