Multilingual text (NLP) processing toolkit
Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE)
An easy-to-use Python module that helps you to extract the BERT embeddings for a large text dataset (Bengali/English) efficiently
A NLP based solution which removes email signatures from the rest of the text
A Python scripts for extracting linguistic features from Filipino texts
Code for classifying international patents based on the text of their titles/abstracts
Imposing Relation Structure in Language-Model Embeddings Using Contrastive Learning
Beyond Paragraphs: NLP for Long Sequences
A NLP program: tokenize method, PoS Tagging with deep learning
Grading tools for Advanced NLP (11-711)
Data loaders and abstractions for text and NLP
Pytorch implementations of various Deep NLP models in cs-224n
An IPython Notebook tutorial on deep learning for natural language processing, including structure prediction
Wrapper library for text generation / language models at char and word level with RNN in TensorFlow
Tokenization is a necessary first step in many natural language processing tasks
EMNLP'2021: Simple Entity-centric Questions Challenge Dense Retrievers
fastText is a library for efficient learning of word representations and sentence classification.
A Multilingual Latent Dirichlet Allocation (LDA) Pipeline with Stop Words Removal, n-gram features, and Inverse Stemming, in Python
A2T: Towards Improving Adversarial Training of NLP Models (EMNLP 2021 Findings)
This repo in the implementation of EMNLP'21 paper "SPARQLing Database Queries from Intermediate Question Decompositions" by Irina Saparina, Anton Osok
PySwip is a Python - SWI-Prolog bridge enabling to query SWI-Prolog in your Python programs.
GreynirCorrect: Spelling and grammar correction for Icelandic
Finding the right information in the digital era may be the real challenge nowadays because it's growing exponentially day by day!
IndoBERTweet is the first large-scale pretrained model for Indonesian Twitter