This is the implementation of “Training deep neural networks via direct loss minimization” published at ICML 2016 in PyTorch. The implementation targets the 0-1 loss.

The repository consists of 3 script files:

  1. a demonstration to train MNIST with 0-1 loss
  2. a class defining the architecture of the model used
  3. consists of the function used to estimate the gradient.

One can run the demonstration in by copying and modifying (e.g. location to save checkpoints) the command at the top of the script. Here are the results I got when training on MNIST for 100 epochs.


Figure 1. Training MNIST with 0-1 loss for 100 epochs.

testing result

testing result

Figure 2. Testing results evaluated at each epoch: (top) cross-entropy loss, and (bottom) prediction accuracy.

If you want to estimate the gradient of 0-1 loss and integrate into your code, please import the grad_estimation function in


View Github