Medical natural language parsing and utility library

Python 3.7
Python 3.8
Python 3.9
Build Status

A natural language medical domain parsing library. This library:

  • Provides an interface to the UTS (UMLS Terminology Services) RESTful
    service with data caching (NIH login needed).
  • Wraps the MedCAT library by parsing medical and clinical text into first
    class Python objects reflecting the structure of the natural language
    complete with UMLS entity linking with CUIs and other domain specific
  • Combines non-medical (such as POS and NER tags) and medical features (such as
    CUIs) in one API and resulting data structure and/or as a Pandas data
  • Provides cui2vec as a word embedding model for either fast indexing and
    access or to use directly as features in a Zensols Deep NLP embedding layer
  • Provides access to cTAKES using as a dictionary like Stash abstraction.
  • Includes a command line program to access all of these features without
    having to write any code.


See the full documentation.
The API reference is also


The easiest way to install the command line program is via the pip installer:

pip3 install zensols.mednlp

Binaries are also available on pypi.

If the cui2vec functionality is used, the Zensols Deep NLP library
is also needed, which is stalled with pip install zensols.deepnlp.


This API utilizes the following frameworks:

  • MedCAT: used to extract information from Electronic Health Records (EHRs)
    and link it to biomedical ontologies like SNOMED-CT and UMLS.
  • cTAKES: a natural language processing system for extraction of information
    from electronic medical record clinical free-text.
  • cui2vec: a new set of (like word) embeddings for medical concepts learned
    using an extremely large collection of multimodal medical data.
  • Zensols Deep NLP library: a deep learning utility library for natural
    language processing that aids in feature engineering and embedding layers.
  • ctakes-parser: parses cTAKES output in to a Pandas data frame.


If you use this project in your research please use the following BibTeX entry:

  title={DeepZensols: Deep Natural Language Processing Framework},
  note={arXiv: 2109.03383},
  journal={arXiv:2109.03383 [cs]},
  author={Landes, Paul and Di Eugenio, Barbara and Caragea, Cornelia},


Please star the project and let me know how and where you use this API.
Contributions as pull requests, feedback and any input is welcome.


An extensive changelog is available here.


MIT License

Copyright (c) 2021 – 2022 Paul Landes


View Github