Textpipe: clean and extract metadata from text

Mar 30, 2019 1 min read

textpipe

textpipe is a Python package for converting raw text in to clean, readable text and extracting metadata from that text. Its functionalities include transforming raw text into readable text by removing HTML tags and extracting metadata such as the number of words and named entities from the text.

Vision: the zen of textpipe

Designed for use in production pipelines without adult supervision.
Rechargeable batteries included: provide sane defaults and clear examples to adapt.
A uniform interface with thin wrappers around state-of-the-art NLP packages.
As language-agnostic as possible.
Bring your own models.

Features

Clean raw text by removing HTML and other unreadable constructs
Identify the language of text
Extract the number of words, number of sentences, named entities from a text
Calculate the complexity of a text
Obtain text metadata by specifying a pipeline containing all desired elements
Obtain sentiment (polarity and a subjectivity score)
Generates word counts
Computes minhash for cheap similarity estimation of documents

Usage example

>>> from textpipe import doc, pipeline
>>> sample_text = 'Sample text! <!DOCTYPE>'
>>> document = doc.Doc(sample_text)
>>> print(document.clean)
'Sample text!'
>>> print(document.language)
'en'
>>> print(document.nwords)
2

>>> pipe = pipeline.Pipeline(['CleanText', 'NWords'])
>>> print(pipe(sample_text))
{'CleanText': 'Sample text!', 'NWords': 2}

In order to extend the existing Textpipe operations with your own proprietary operations;

test_pipe = pipeline.Pipeline(['CleanText', 'NWords'])
def custom_op(doc, context=None, settings=None, **kwargs):
    return 1

custom_argument = {'argument' :1 }
test_pipe.register_operation('CUSTOM_STEP', custom_op)
test_pipe.steps.append(('CUSTOM_STEP', custom_argument ))

GitHub

Natural Language Processing

John was the first writer to have joined pythonawesome.com. He has since then inculcated very effective writing and reviewing culture at pythonawesome which rivals have found impossible to imitate.

Textpipe: clean and extract metadata from text

textpipe

Vision: the zen of textpipe

Features

Usage example

GitHub

John

The author's officially unofficial PyTorch BigGAN implementation

The example of running Pose Estimation using Core ML

textpipe

Vision: the zen of textpipe

Features

Usage example

GitHub

The author's officially unofficial PyTorch BigGAN implementation

The example of running Pose Estimation using Core ML

You might also like...