/ Natural Language Processing

Accelerated Text helps you automatically generate natural language descriptions

Accelerated Text helps you automatically generate natural language descriptions

accelerated-text

Accelerated Text helps you automatically generate natural language descriptions of your data varying in wording and structure.

A picture is worth a thousand words. Or is it? Tables, charts, pictures are all useful in undestanding our data but often we need a description; a story to tell us what are we looking at. Accelerated Text is a natural language generation tool which allows you to define data descriptions and then generates multiple versions of those descriptions varying in wording and structure.

accelerated-text

About

Accelerated Text can work with all sorts of data:

  • descriptions of business metrics,
  • customer interaction data,
  • product attributes,
  • financial metrics.

With Accelerated Text you can use such data and generate text for your business reports, your e-commerce platform, or your customer support system.

Accelerated Text provides a web based Document Plan builder, where

  • the logical structure of the document is defined,
  • communication goals are expressed,
  • data usage within a text is defined.

Document Plans and the connected data are used by Accelerated Text's Natural Language Generation engine
to produce multiple variations of the text exactly expressing what was intended to be communicated to the readers.

Philosophy

Whereof one cannot speak thereof one must be silent

-- Wittgenstein

Natural language generation is a broad domain with applications in chat-bots, story generation, and data descriptions to name a few.
Accelerated Text focuses on applying NLG technology to solve your data to text needs.

Data descriptions require precision.
For example, a text describing weather conditions can not invent things beyond what it was provided with: Temperature: -1C, Humidity: 40%, Wind: 10km/h.
The generated text can only state those facts. The expression of an individual fact - temperature - could vary.
It could result in a "it is cold", or "it is just below freezing", or "-1C" but this fact will be stated because it is in the data.
A data to text system is also not the one to elaborate on a story adding something about the serenity of the freezing lake - again, it was not in the supplied data.

Accelerated Text follows the principle of this strict adherence to the data-bound text generation.
Via its user interface it provides instruments to define how the data should be translated into a descriptive text.
This description - a document plan - is executed by natural language generation engine to produce texts that vary in structure and wording but are always and only about the data provided.

Key Features

  • Document plan editor to define what needs to be said about the data.
  • Data samples can be uploaded as CSV files to be used when building Document Plans.
  • Text structure variations to provide richer reading experience going beyond rigid template generated text.
  • Vocabulary control to match the language style of each of your reader groups.
  • Build-in rule engine to allow the control of what is said based on the different values of the data points.
  • Live preview to see variations of generated text.

Getting Started

Running

If you want to start tinkering and run it based on the latest code in the repository, first make sure that you have make installed

Then clone the project and run

make run-dev-env

After running this command the front-end will be availabe at the http://localhost:8080

The generation back-end API is at http://localhost:3001

For full information on Deployment, visit Deployment Section

Usage

Create Document Plan

Follow the step by step guide bellow to create a very simple document plan which
generates book authorship sentences.

View Step
Firstly a new document plan has to be created. The application starts with a Create Plan button in its workspace.
You get an initial empty plan.
You'll need to select a CSV file to provide data for the natural language generation. Select a books.csv file.
The central part of the plan is the Abstract Meaning Representation element which defines the message to be communicated. Select Author from the AMR section.
Then we need to select from where in our book store data we'll have the Author field.
Same for Title field.
That's it, the plan is ready and should look like in the picture to the right.
Text Analysis section shows text variations generated by the natural language generation engine.

GraphQL API use

In previous section we created a simple document plan. Here we will use Accelerated Text GraphQL API to fetch the text for the book items. In order to get the generated text, two bits of information need to be passed to the NLG backend: document plan identifier, and data item identifier for which the text will be generated.

If Accelerated Text was started as described in Running section, then GraphQL endpoint is accessible at http://localhost:3001/_graphql endpoint. CURL will be used to illustrate the calls to the back end.

First lets get registered document plans:

curl -X POST -H "Content-Type: application/json" \
  --data '{ "query": "{documentPlans(offset:0 limit:10){items{id uid name dataSampleId dataSampleRow createdAt updatedAt updateCount} offset limit totalCount}}" }' \
  http://localhost:3001/_graphql

This will return a list of document plans:

  {:documentPlans
   {:limit 10,
    :offset 0,
    :totalCount 1,
    :items
    [{:updatedAt 1570951531,
      :uid "a7c31454-7f1b-4653-b14c-e2685793c110",
      :name "Book Store",
      :createdAt 1570951486,
      :dataSampleId "example-user/books.csv",
      :id "0ecdbada-dbbf-4b12-b1cb-cd6571181248",
      :dataSampleRow 0,
      :updateCount 8}]}}}}

The id field gives document plan id, and the dataSampleId field specifies which data to use.

{"documentPlanId":"0ecdbada-dbbf-4b12-b1cb-cd6571181248",
 "readerFlagValues":{},
 "dataId":"example-user/books.csv"}

With this, a second call has to be made to get the results identifier for actual sentence polling. Polling is used because text is not generated right away, NLG process for a more complcated plans can take some time.

curl -XPOST -H "Content-Type: application/json" \ 
http://localhost:3001/nlg -d '{"documentPlanId":"0ecdbada-dbbf-4b12-b1cb-cd6571181248","readerFlagValues":{},"dataId":"example-user/books.csv"}'

A result id is returned:

{"resultId" : "6f26099d-429d-41e9-9800-83ab58c59ddd"}

Whith this a final request can be made to fetch the results. Note that it can be done repeatedly with high performance, since the text generation is not happening at this stage.

curl -XGET -H "Content-Type: application/json" http://localhost:3001/nlg/6f26099d-429d-41e9-9800-83ab58c59ddd

You should get generated text with annotations (data is truncated):

{
   "offset":0,
   "totalCount":5,
   "ready":true,
   "variants":[
      {
         "type":"ANNOTATED_TEXT",
         "id":"ae9a1d60-4aa6-49da-9738-480243a5095b",
         "children":[
            {
               "type":"PARAGRAPH",
               "id":"ab8b650a-8774-4992-8a56-fe8d01f74097",
               "children":[
                  {
                     "type":"SENTENCE",
                     "id":"5cda3e9f-8fad-4b69-a0a3-f9f5e9a19465",
                     "children":[
                        {
                           "type":"WORD",
                           "id":"db5c71ec-f893-4406-8a3f-e91ca6aa08dc",
                           "text":"Building"
                        },
                        {
                           "type":"WORD",
                           "id":"84e164c7-1fd0-4f18-b120-8a4f7563741b",
                           "text":"Search"
                        },
                        {
                           "type":"WORD",
                           "id":"2e54b4b1-ed52-4689-8f5f-0a06ec8a35b5",
                           "text":"Applications"
                        }
                        ...
                        {
                           "type":"PUNCTUATION",
                           "id":"5710b009-4ad2-4b6d-b738-945cb576c1fc",
                           "text":"."
                        }
                     ]
                  }
               ]
            }
         ]
      },
     ...
}

Clojure API use

Accelerated Text UI helps with creating document plan and testing it with sample data.
Accelerated Text's generation functionality can be used directly from the Clojure code.

Lets say you have a book data limited to the author and the book title:

title author
Frankenstein M. W. Shelley
Dracula Bram Stoker
The Island of Doctor Moreau H.G. Wells

When working via UI this data needs to be uploaded as the CSV. To use it in the code we'll have to represent as a Clojure map.

(def data
  [{:title "Frankenstein"
    :author "M. W. Shelley"}
   {:title "Dracula"
    :author "Bram Stoker"}
   {:title "The Island of Doctor Moreau"
    :author "H.G. Wells"}])

Second component needed for generation is the plan itself. In UI it has a nice representation in visual blocks, and is persisted in the structure like this:

(def document-plan 
  {:type "Document-plan"
  :segments
  [{:type "Segment"
    :textType "description"
    :children
    [{:type "AMR"
      :conceptId "author"
      :dictionaryItem {:itemId "written"
                        :name "written"
                        :type "Dictionary-item"}
      :roles [{:name "Agent"
                :children [{:type "Thematic-role"
                            :children [{:type "Cell"
                                        :name "author"}]}]}
              {:name "co-Agent"
                :children [{:type "Thematic-role"
                            :children [{:type "Cell"
                                        :name "title"}]}]}]}]}]})

With those two in place we can generate the text:

(api.nlg.generator.planner-ng/render-dp document-plan data {})
=>
("The Island of Doctor Moreau is written by H.G. Wells"))
("Frankenstein is done by M. W. Shelley."
 "Dracula is done by Bram Stoker."
 "The Island of Doctor Moreau is written by H.G. Wells.")

GitHub

Comments