Sentence Similarity Calculator

This repo contains various ways to calculate the similarity between source and target sentences. You can choose the pre-trained models you want to use such as ELMo, BERT and Universal Sentence Encoder (USE).

And you can also choose the method to be used to get the similarity:

1. Cosine similarity
2. Manhattan distance
3. Euclidean distance
4. Angular distance
5. Inner product
6. TS-SS score
7. Pairwise-cosine similarity
8. Pairwise-cosine similarity + IDF

You can experiment with (The number of models) x (The number of methods) combinations!


Installation

  • After cloning this repository, you can simply install all the dependent libraries described in requirements.txt with pip install -r requirements.txt.
git clone https://github.com/Huffon/sentence-similarity.git
cd sentence-similarity
pip install -r requirements.txt

Usage

  • To test your sentences, you should fill out corpus.txt with sentences as below.
I ate an apple.
I went to the Apple.
I ate an orange.
...
  • Then, choose the model and method to be used to calculate the similarity between source and target sentences.
python sensim.py
    --model    MODEL_NAME
    --method   METHOD_NAME
    --verbose  LOG_OPTION (bool)

Examples

  • In the following section, you can see the result of sentence-similarity.
  • As you guys know, there is a no silver-bullet which can calculate perfect similarity between sentences. You should conduct various experiments with your dataset.
    • Caution: TS-SS score might not fit with short-sentence similarity task, since this method originally devised to calculate the similarity between documents.
  • Result:

result

Requirements

  • Python version should be higher than 3.6.x
  • You should install PyTorch via official Installation guide
allennlp==0.9.0
bert-score==0.2.1
numpy==1.17.3
scikit-learn==0.21.3
scipy==1.3.1
seaborn==0.9.0
sentence-transformers==0.2.3
spacy==2.1.9
tensorflow==1.15.0
tensorflow-hub==0.7.0
torch==1.3.0

TODO

  • Upgrade TF to TF2.0 to use USE 3
  • Add pairwise cosine similarity method in use_elmo.
  • Add InferSent, Sent2Vec, plain GloVe as models.

GitHub