PyTorch-NLP

PyTorch-NLP, or torchnlp for short, is a library of neural network layers, text processing modules and datasets designed to accelerate Natural Language Processing (NLP) research.

Join our community, add datasets and neural network layers! Chat with us on Gitter and join the Google Group, we're eager to collaborate with you.

Installation

Make sure you have Python 3.5+ and PyTorch 0.4 or newer. You can then install pytorch-nlp using
pip:

pip install pytorch-nlp

Or to install the latest code via:

pip install git+https://github.com/PetrochukM/PyTorch-NLP.git

Docs ?

The complete documentation for PyTorch-NLP is available via our ReadTheDocs website.

Basics

Add PyTorch-NLP to your project by following one of the common use cases:

Load a Dataset

Load the IMDB dataset, for example:

from torchnlp.datasets import imdb_dataset

# Load the imdb training dataset
train = imdb_dataset(train=True)
train[0]  # RETURNS: {'text': 'For a movie that gets..', 'sentiment': 'pos'}

Apply Neural Networks Layers

For example, from the neural network package, apply a Simple Recurrent Unit (SRU):

from torchnlp.nn import SRU
import torch

input_ = torch.autograd.Variable(torch.randn(6, 3, 10))
sru = SRU(10, 20)

# Apply a Simple Recurrent Unit to `input_`
sru(input_)
# RETURNS: (
#   output [torch.FloatTensor (6x3x20)],
#   hidden_state [torch.FloatTensor (2x3x20)]
# )

Encode Text

Tokenize and encode text as a tensor. For example, a WhitespaceEncoder breaks text into terms whenever it encounters a whitespace character.

from torchnlp.text_encoders import WhitespaceEncoder

# Create a `WhitespaceEncoder` with a corpus of text
encoder = WhitespaceEncoder(["now this ain't funny", "so don't you dare laugh"])

# Encode and decode phrases
encoder.encode("this ain't funny.") # RETURNS: torch.LongTensor([6, 7, 1])
encoder.decode(encoder.encode("This ain't funny.")) # RETURNS: "this ain't funny."

Load Word Vectors

For example, load FastText, state-of-the-art English word vectors:

from torchnlp.word_to_vector import FastText

vectors = FastText()
# Load vectors for any word as a `torch.FloatTensor`
vectors['hello']  # RETURNS: [torch.FloatTensor of size 100]

Compute Metrics

Finally, compute common metrics such as the BLEU score.

from torchnlp.metrics import get_moses_multi_bleu

hypotheses = ["The brown fox jumps over the dog 笑"]
references = ["The quick brown fox jumps over the lazy dog 笑"]

# Compute BLEU score with the official BLEU perl script
get_moses_multi_bleu(hypotheses, references, lowercase=True)  # RETURNS: 47.9

GitHub