Deep Learning Framework for various Image Recognition Tasks

Bonnetal

Bonnet and then some! Deep Learning Framework for various Image Recognition Tasks. Photogrammetry and Robotics Lab, University of Bonn.

By Andres Milioto et.al @ University of Bonn.

In early 2018 we released Bonnet, which is a real-time, robotics oriented semantic segmentation framework using Convolutional Neural Networks (CNNs).
Bonnet provides an easy pipeline to add architectures and datasets for semantic segmentation, in order to train and deploy CNNs on a robot. It contains a full training pipeline in Python using Tensorflow and OpenCV, and it also some C++ apps to deploy a CNN in ROS and standalone. The C++ library is made in a way which allows to add other backends (such as TensorRT).

Back then, most of my research was in the field of semantic segmentation, so that was what the framework was therefore tailored specifically to do. Since then, we have found a way to make things even more awesome, allowing for a suite of other tasks, like classification, detection, instance and semantic segmentation, feature extraction, counting, etc. Hence, the new name of this new framework: "Bonnetal", reflects that this is nothing but the old Bonnet, and then some. Hopefully, the explict et.al. will also spawn further collaboration and many pull requests! :smile:

We've also switched to PyTorch to allow for easier mixing of backbones, decoders, and heads for different tasks. If you are still comfortable with just semantic segmentation, and/or you're a fan of TensorFlow, you can still find the original Bonnet here. Otherwise, keep on reading, and I'll try to explain why Bonnetal rules!

DISCLAIMER: I am currently bringing all the functionality out from a previously closed-source framework, so be patient if the task/weights are a placeholder, and send me an email to ask for a schedule on the particular part that you need.

This code provides a framework to mix-match popular, imagenet-trained, backbones with different decoders to achieve different CNN-enabled tasks. All of these have pre-trained imagenet weights when used, that get downloaded by default if the conditions are met.

Backbones included are (so far):
- ResNet 18
- ResNet 34
- ResNet 50
- ResNet 101
- ResNet 152
- Darknet 53
- Darknet 21
- ERFNet
- MobilenetsV2

The main reason for the "lack" of variety of backbones so far is that imagenet pre-training takes a while, and it is pretty resource intensive. If you want a new backbone implemented we can talk about it, and you can share your resources to pretrain it :smiley: (PR's welcome :wink:)

Tasks included are:
- Full-image classification: /train, /deploy.
- Semantic Segmentation: /train, /deploy.
- More coming...

The code is (like the original Bonnet) separated into a training part developed in Python, using Pytorch, and a deployment/inference part, which is fully written in C++, and contains the code to run on the robot, either using ROS or standalone.