PyTorch Highway Networks

Highway networks implemented in PyTorch.

Highway Equation

Just the MNIST example from PyTorch hacked to work with Highway layers.


  • Make the Highway nn.Module reuseable and configurable.
  • Why does softmax work better than sigmoid? This shouldn’t be the case…
  • Make training graphs on the MNIST dataset.
  • Add convolutional highway networks.
  • Add recurrent highway networks.
  • Experiment with convolutional highway networks for character embeddings.


  • ELU doesn’t work better than RELU for the layer activation.
  • Softmax seems to work better than sigmoid for the gate function?!