A model training library for PyTorch.

Torchbearer is a PyTorch model training library designed by researchers, for researchers. Specifically, if you occasionally want to perform advanced custom operations but generally don't want to write hundreds of lines of untested code then this is the library for you. Our design decisions are geared towards flexibility and customisability whilst trying to maintain the simplest possible API.

Key Features

Keras-like training API using calls to fit(...) / fit_generator(...)
Sophisticated metric API which supports calculation data (e.g. accuracy) flowing to multiple aggregators which can calculate running values (e.g. mean) and values for the epoch (e.g. std, mean, area under curve)
Simple callback API with a persistent model state that supports adding to the loss or accessing the metric values
A host of callbacks included from the start that enable: tensorboard logging (for metrics, images and data), model checkpointing, weight decay, learning rate schedulers, gradient clipping and more
Decorator APIs for metrics and callbacks that allow for simple construction of callbacks and metrics
An example library (still under construction) with a set of demos showing how complex models (such as GANs and VAEs) can be implemented easily with torchbearer
Fully tested; as researchers we want to trust that our metrics and callbacks work properly, we have therefore tested everything thouroughly for peace of mind

        nn.Conv2d(32, 64, stride=2, kernel_size=3),

    self.classifier = nn.Linear(576, 10)

def forward(self, x):
    x = self.convs(x)
    x = x.view(-1, 576)
    return self.classifier(x)

model = SimpleModel()

- Now that we have a model we can train it simply by wrapping it in a torchbearer Model instance:

optimizer = optim.Adam(filter(lambda p: p.requires_grad, model.parameters()), lr=0.001)
loss = nn.CrossEntropyLoss()

from torchbearer import Model

torchbearer_model = Model(model, optimizer, loss, metrics=['acc', 'loss']).to('cuda')
torchbearer_model.fit_generator(traingen, epochs=10, validation_generator=valgen)

  • Running that code gives output using Tqdm and providing running accuracies and losses during the training phase:
0/10(t): 100%|██████████| 352/352 [00:01<00:00, 233.36it/s, running_acc=0.536, running_loss=1.32, acc=0.459, acc_std=0.498, loss=1.52, loss_std=0.239]
0/10(v): 100%|██████████| 40/40 [00:00<00:00, 239.40it/s, val_acc=0.536, val_acc_std=0.499, val_loss=1.29, val_loss_std=0.0731]
9/10(t): 100%|██████████| 352/352 [00:01<00:00, 215.76it/s, running_acc=0.741, running_loss=0.735, acc=0.754, acc_std=0.431, loss=0.703, loss_std=0.0897]
9/10(v): 100%|██████████| 40/40 [00:00<00:00, 222.72it/s, val_acc=0.68, val_acc_std=0.466, val_loss=0.948, val_loss_std=0.181]
0/1(e): 100%|██████████| 79/79 [00:00<00:00, 268.70it/s, val_acc=0.678, val_acc_std=0.467, val_loss=0.925, val_loss_std=0.109]