/ Machine Learning

PyTorch functions to improve performance

PyTorch functions to improve performance


PyTorch functions to improve performance, analyse and make your deep learning life easier.

torchfunc is library revolving around PyTorch with a goal to help you with:

  • Improving and analysing performance of your neural network (e.g. Tensor Cores compatibility)
  • Record/analyse internal state of torch.nn.Module as data passes through it
  • Do the above based on external conditions (using single Callable to specify it)
  • Day-to-day neural network related duties (model size, seeding, performance measurements etc.)
  • Get information about your host operating system, CUDA devices and others

Quick examples

  • Get instant performance tips about your module. All problems described by comments
    will be shown by torchfunc.performance.tips:
class Model(torch.nn.Module):
    def __init__(self):
        self.convolution = torch.nn.Sequential(
            torch.nn.Conv2d(1, 32, 3),
            torch.nn.ReLU(inplace=True),  # Inplace may harm kernel fusion
            torch.nn.Conv2d(32, 128, 3, groups=32),  # Depthwise is slower in PyTorch
            torch.nn.ReLU(inplace=True),  # Same as before
            torch.nn.Conv2d(128, 250, 3),  # Wrong output size for TensorCores

        self.classifier = torch.nn.Sequential(
            torch.nn.Linear(250, 64),  # Wrong input size for TensorCores
            torch.nn.ReLU(),  # Fine, no info about this layer
            torch.nn.Linear(64, 10),  # Wrong output size for TensorCores

    def forward(self, inputs):
        convolved = torch.nn.AdaptiveAvgPool2d(1)(self.convolution(inputs)).flatten()
        return self.classifier(convolved)

# All you have to do
  • Seed globaly (including numpy and cuda), freeze weights, check inference time and model size:
# Inb4 MNIST, you can use any module with those functions
model = torch.nn.Linear(784, 10)
frozen = torchfunc.module.freeze(model, bias=False)

with torchfunc.Timer() as timer:
  frozen(torch.randn(32, 784)
  print(timer.checkpoint()) # Time since the beginning
  frozen(torch.randn(128, 784)
  print(timer.checkpoint()) # Since last checkpoint
print(f"Overall time {timer}; Model size: {torchfunc.sizeof(frozen)}")
  • Record and sum per-layer activation statistics as data passes through network:
# Still MNIST but any module can be put in it's place
model = torch.nn.Sequential(
    torch.nn.Linear(784, 100),
    torch.nn.Linear(100, 50),
    torch.nn.Linear(50, 10),
# Recorder which sums all inputs to layers
recorder = torchfunc.hooks.recorders.ForwardPre(reduction=lambda x, y: x+y)
# Record only for torch.nn.Linear
recorder.children(model, types=(torch.nn.Linear,))
# Train your network normally (or pass data through it)
# Activations of all neurons of first layer! 
print(recorder[1]) # You can also post-process this data easily with apply

For other examples (and how to use condition), see documentation



Latest release:

pip install --user torchfunc


pip install --user torchfunc-nightly


CPU standalone and various versions of GPU enabled images are available
at dockerhub.

For CPU quickstart, issue:

docker pull szymonmaszke/torchfunc:18.04

Nightly builds are also available, just prefix tag with nightly_. If you are going for GPU image make sure you have
nvidia/docker installed and it's runtime set.