Quantifiers-and-Negations-in-RE-Documents

This project was part of my work for a seminar at the Technical University of Munich (TUM) during my bachelor studies in 2019. The python project can be used to find
quantifiers and negations in documents. It searches for problematic findings.
Problematic findings are i.e. sentences that use specific combinations of quantifiers and negations that are ambiguous.
This means there are multiple valid interpretations of the sentence.
It can extract those and report them.

Motivation:

You want to avoid ambiguous sentences as they can cause problems
that are hard to find and possibly hard to fix. This is especially the case for
technical specifications and similar use cases.
In this project we compare two different approaches to finding ambiguous sentences:

  1. String based search
  2. NLP based search

We want to find out if the computational overhead of using NLP gives better results
than standard string based search methods.

Features:

  1. Detect quantifiers and negations in .xml or .txt documents
  2. Search either by a string based search or by NLP based search
    (using Stanfords CoreNLP library [1])
  3. Extract possibly ambiguous sentences
  4. Compare string search results with NLP search results

Prerequisites:

  1. Java 8 or higher
  2. Python 3.6 or higher as project interpreter
  3. Stanford Corenlp library: https://stanfordnlp.github.io/CoreNLP/download.html
  4. Environment variable “CORENLP_HOME” set to where the CoreNLP library is stored

References:

Christopher D.Manning, MihaiSurdeanu, JohnBauer, JennyFinkel, StevenJ.Bethard, and David McClosky. The Stanford CoreNLP natural language processing toolkit. In Association for Computational Linguistics (ACL) System Demonstrations, pages 55–60, 2014.

GitHub

View Github