Bayesian Active Learning (Baal)
BaaL is an active learning library developed at ElementAI. This repository contains techniques and reusable components to make active learning accessible for all.
Read the documentation at https://baal.readthedocs.io.
Installation and requirements
To install baal using pip:
pip install baal
To install baal from source:
pip install -e .
For requirements please see: requirements.txt.
What is Active Learning?
Active learning is a special case of machine learning in which a learning
algorithm is able to interactively query the user (or some other information
source) to obtain the desired outputs at new data points
(to understand the concept in more depth, refer to our tutorial).
At the moment BaaL supports the following methods to perform active learning.
- Monte-Carlo Dropout (Gal et al. 2015)
Please see our Roadmap below.
The Monte-Carlo Dropout method is a known approximation for Bayesian neural
networks. In this method, the dropout layer is used both in training and test
time. By running the model multiple times whilst randomly dropping weights, we calculate the uncertainty of the prediction using one of the uncertainty measurements in src/baal/active/heuristics.py.
The framework consists of four main parts, as demonstrated in the flowchart below:
To get started, wrap your dataset in our ActiveLearningDataset class. This will ensure that the dataset is split into
pool sets. The
pool set represents the portion of the training set which is yet
to be labelled.
We provide a lightweight object ModelWrapper similar to
keras.Model to make it easier to train and test the model. If your model is not ready for active learning, we provide Modules to prepare them.
For example, the MCDropoutModule wrapper changes the existing dropout layer
to be used in both training and inference time and the
the specifies the number of iterations to run at training and inference.
In conclusion, your script should be similar to this:
dataset = ActiveLearningDataset(your_dataset) dataset.label_randomly(INITIAL_POOL) # label some data model = MCDropoutModule(your_model) model = ModelWrapper(model, your_criterion) active_loop = ActiveLearningLoop(dataset, get_probabilities=model.predict_on_dataset, heuristic=heuristics.BALD(shuffle_prop=0.1), ndata_to_label=NDATA_TO_LABEL) for al_step in range(N_ALSTEP): model.train_on_dataset(dataset, optimizer, BATCH_SIZE, use_cuda=use_cuda) if not active_loop.step(): # We're done! break
For a complete experiment, we provide experiments/ to understand how to
write an active training process. Generally, we use the ActiveLearningLoop
provided at src/baal/active/active_loop.py.
This class provides functionality to get the predictions on the unlabeled pool
after each (few) epoch(s) and sort the next set of data items to be labeled
based on the calculated uncertainty of the pool.
Roadmap (Subject to change depending on the community.)
- [x] Initial FOSS release with MCDropout (Gal et al. 2015)
- [ ] MCDropConnect (Mobiny et al. 2019)
- [ ] Bayesian layers (Shridhar et al. 2019)
- [ ] Unsupervised methods
- [ ] NNGP (Panov et al. 2019)
- [ ] SWAG (Zellers et al. 2018)
Re-run our Experiments
nvidia-docker build [--target prod_baal] -t baal . nvidia-docker run --rm baal python3 experiments/vgg_mcdropout_cifar10.py
Use BaaL for YOUR Experiments
Simply clone the repo, and create your own experiment script similar to the
example at experiments/vgg_experiment.py. Make sure to use the four main parts
of BaaL framework. Happy running experiments
Simply build the Dockerfile as below:
git clone [email protected]:ElementAI/baal.git nvidia-docker build [--target base_baal] -t baal-dev .
Now you have all the requirements to start contributing to BaaL. YEAH!