pyrelational

Quick install

pip install pyrelational

Organisation of repository

pyrelational contains the source code for the pyrelational package. It contains the main sub-packages for active learning strategies, various informativeness measures, and methods for estimating posterior uncertainties.
examples contains various example scripts and notebooks detailing how the package can be used
tests unit tests for pyrelational package
docs docs and assets for docs

The `pyrelational` package

Example

# Active Learning package
import pyrelational as pal
from pyrelational.data.data_manager import GenericDataManager
from pyrelational.strategies.generic_al_strategy import GenericActiveLearningStrategy
from pyrelational.models.generic_model import GenericModel

# Instantiate data-loaders, models, trainers the usual Pytorch/PytorchLightning way
# In most cases, no change is needed to current workflow to incorporate
# active learning
data_manager = GenericDataManager(dataset, train_mask, validation_mask, test_mask)

# Create a model class that will handle model instantiation
model = GenericModel(ModelConstructor, model_config, trainer_config, **kwargs)

# Use the various implemented active learning strategies or define your own
al_manager = GenericActiveLearningStrategy(data_manager=data_manager, model=model)
al_manager.theoretical_performance(test_loader=test_loader)
al_manager.full_active_learning_run(num_annotate=100, test_loader=test_loader)

Overview

The pyrelational package offers a flexible workflow to enable active learning with as little change to the models and datasets as possible. It is partially inspired by Robert (Munro) Monarch’s book: “Human-In-The-Loop Machine Learning” and shares some vocabulary from there. It is principally designed with PyTorch in mind, but can be easily extended to work with other libraries.

For a primer on active learning, we refer the reader to Burr Settles’s survey [reference]. In his own words

The key idea behind active learning is that a machine learning algorithm can
achieve greater accuracy with fewer training labels if it is allowed to choose the
data from which it learns. An active learner may pose queries, usually in the form
of unlabeled data instances to be labeled by an oracle (e.g., a human annotator).
Active learning is well-motivated in many modern machine learning problems,
where unlabeled data may be abundant or easily obtained, but labels are difficult,
time-consuming, or expensive to obtain.

The pyrelational package decomposes the active learning workflow into four main components: 1) a data manager, 2) a model, 3) an acquisition strategy built around informativeness scorer, and 4) an oracle (see Figure above). Note that the oracle is external to the package.

The data manager (defined in pyrelational.data.data_manager.GenericDataManager) wraps around a PyTorch Dataset and handles dataloader instantiation as well as tracking and updating of labelled and unlabelled sample pools.

The model (subclassed from pyrelational.models.generic_model.GenericModel) wraps a user defined ML model (e.g. PyTorch Module, Pytorch Lightning Module, or scikit-learn estimator) and handles instantiation, training, testing, as well as uncertainty quantification (e.g. ensembling, MC-dropout). It also enables using ML models that directly estimate their uncertainties such as Gaussian Processes (see examples/demo/model_gaussianprocesses.py).

The active learning strategy (which subclass pyrelational.strategies.generic_al_strategy.GenericActiveLearningStrategy) revolves around an informativeness score that serve as the basis for the selection of the query sent to the oracle for labelling. We define various strategies for classification, regression, and task-agnostic scenarios based on informativeness scorer defined in pyrelational.informativeness.

Prerequisites and setup

For those just using the package, installation only requires standard ML packages and PyTorch. Starting with a new virtual environment (miniconda environment recommended), install standard learning packages and numerical tools.

pip install -r requirements.txt

If you wish to contribute to the code, run pre-commit install after the above step.

Building the docs

Make sure you have sphinx and sphinx-rtd-theme packages installed (pip install sphinx sphinx_rtd_theme will install this).

To generate the docs, cd into the docs/ directory and run make html. This will generate the docs
at docs/_build/html/index.html.

Quickstart & examples

The examples/ folder contains multiple scripts and notebooks demonstrating how to use pyrelational effectively.

The diverse examples scripts and notebooks aim to showcase how to use pyrelational in various scenario. Specifically,

examples with regression
- lightning_diversity_regression.py
- lightning_mixed_regression.py
- mcdropout_uncertainty_regression.py
- model_gaussianprocesses.py
- model_badge.py
examples with classification tasks
- ensemble_uncertainty_classification.py
- lightning_diversity_classification.py
- lightning_representative_classification.py
- mcdropout_uncertainty_classification.py
- scikit_estimator.py
examples with task-agnostic acquisition
- lightning_diversity_classification.py
- lightning_representative_classification.py
- lightning_diversity_regression.py
- model_badge.py
examples showcasing different uncertainty estimator
- ensemble_uncertainty_classification.py
- mcdropout_uncertainty_classification.py
- gpytorch_integration.py
- model_badge.py
examples custom acquisition strategy
- model_badge.py
- lightning_mixed_regression.py
examples custom model
- model_gaussianprocesses.py

Uncertainty Estimation

MCDropout
Ensemble of models (a.k.a. commitee)
DropConnect (coming soon)
SWAG (coming soon)
MultiSWAG (coming soon)

Informativeness scorer included in the library

Regression (N.B. pyrelational currently only supports single scalar regression tasks)

Greedy
Least confidence
Expected improvement
Thompson Sampling
Upper confidence bound (UCB)
BALD
BatchBALD (coming soon)

Classification (N.B. pyrelational does not support multi-label classification at the moment)

Least confidence
Margin confidence
Entropy based confidence
Ratio based confidence
BALD
Thompson Sampling (coming soon)
BatchBALD (coming soon)

Model agnostic and diversity sampling based approaches

Representative sampling
Diversity sampling
Random acquisition
BADGE

The pyrelational package offers a flexible workflow to enable active learning with as little change to the models and datasets as possible

pyrelational

Quick install

Organisation of repository

The `pyrelational` package

Example

Overview

Prerequisites and setup

Building the docs

Quickstart & examples

Uncertainty Estimation

Informativeness scorer included in the library

Regression (N.B. pyrelational currently only supports single scalar regression tasks)

Classification (N.B. pyrelational does not support multi-label classification at the moment)

Model agnostic and diversity sampling based approaches

John

A stack-based systems language that supports structures, functions, expressions, and user-defined operator behaviour

Management tool for systemd-nspawn containers

pyrelational

Quick install

Organisation of repository

The pyrelational package

Example

Overview

Prerequisites and setup

Building the docs

Quickstart & examples

Uncertainty Estimation

Informativeness scorer included in the library

Regression (N.B. pyrelational currently only supports single scalar regression tasks)

Classification (N.B. pyrelational does not support multi-label classification at the moment)

Model agnostic and diversity sampling based approaches

A stack-based systems language that supports structures, functions, expressions, and user-defined operator behaviour

Management tool for systemd-nspawn containers

You might also like...

The `pyrelational` package