/ Machine Learning

Tiny experimental NLP deep learning library for text classification and NER

Tiny experimental NLP deep learning library for text classification and NER

Aida

Build simple conversational assistants, chatbots, and more.
Simple deep learning language models that can train and predict from the browser, nodejs and python.

Getting started

Aida helps you prototype chatbots, fast.

This is an experimental library for building natural language processing models. It can help you build a simple chatbot, and simple assistants. It's like building a smart regular expression for detecting sentence intentions and extracting key entities, but its much better because it's using neural networks and pre-trained characters bigram embeddings, so it has some general language knowledge.

  • It's easy to use: Getting started by creating a dataset and training couldn't be easier thanks to  Chatito , you can create a large dataset in minutes, and start training without any setup, just from the browser.
  • Low memory consumption: Having small file size and memory consumption is very important to be able to predict from the browser and mobile devices. The language embeddings give up some information and performance by not using a full word dictionary, instead the model uses pre-trained word character bigrams provided by fastText , to overcome this problem and get good predictive performance, it is required additional training examples, that is why a generative scripting language is provided to overcome this problem (Chatito DSL).
  • Accurate: Although the model throws away some information by using bigrams to compose words instead of using full words, the models are able to get to good prediction rates given more data to learn from.
  • Universal: The trained models should be able to run from multiple environments, that is why the models have two mirror implementations: in  TensorflowJS  to be able to train and run from the browser or Nodejs, and   Keras with Tensorflow backend for Python.
  • Offline support: It should be able to train and make predictions without connectivty, no need to have a server-side api, although the trained models can also run server-side behind an api if desired. (TODO: add example running as AWS Lambda function)

Check the demo

It's a chatbott running from the browser using Tensorflow.js and using the Web Speech API for speach to text and text to speach.

Train online

You can train from the browser using Javascript and Tensorflow.js (using your local GPU resources) or from the browser using Python and Tensorflow with Keras thanks to Google Colaboratory's free TPU's. There is no need to setup a local environment, the trained models can be saved for later use.

Local NPM package setup

  • Install the npm package:
yarn add aida-nlp
  • Create your chatito definition files, here you define your intents and your possible sentence models in mutiple .chatito files, and save them to a directory. e.g.: ´./chatito´

  • Create a config file like aida_config.json where you define the path to your chatito definition files, the chatito dataset output path and the output path for the trained NLP models:

{
  "chatito": {
    "inputPath": "./chatito",
    "outputPath": "./dataset"
  },
  "aida": {
    "outputPath": "./model",
    "language": "en"
  }
}
  • Generate and encode the dataset for training: npx aida-nlp aida_config.json --action dataset. The dataset will be available at the configured output path.

  • Start training: npx aida-nlp aida_config.json --action train. The models will be saved at the configured output path.

  • Run npx aida-nlp aida_config.json --action test for trying the generated testing dataset.

Local setup cloning the project

Alternatively to training online and using npm package, you can setup the project locally. Clone the GH proejct and install dependencies for node and python (given NodeJS with yarn and Python3 are installed):

  • Run yarn install from the ./typescript directory
  • Run pip3 install -r requirements.txt from the ./python directory

Create a dataset

Edit or create the chatito files inside ./typescript/examples/en/intents to customize the dataset generation as you need. You can read more about Chatito.

Then, from ./typescript directory, run npm run dataset:en:process. This will generate many files at the ./typescript/public/models directory. The dataset, the dataset parameters, the testing parameters and the embeddings dictionary. (Note: Aida also supports spanish language, if you need other language you can add if you first download the fastText embeddings for that language).

Training

Ttrain from 3 local environments:
- For python: open ./python/main.ipynb with jupyter notebook or jupyter lab. Python will load your custom settings generated at step 3. And save the models in a TensorflowJS compatible format at the output directory.

  - For web browsers: from `./typescript` run `npm run web:start`. Then navigate to `http://localhost:8000/train` for the training web UI. After training, downloading the model to the `./typescript/public/pretrained/web` directory (NOTE: this will also generate and download a new dataset).

  - For Node.js: from `./typescript` run `npm run node:start`. This will load the previously dataset generated files from `./typescript/public/models`.

GitHub