Contains various ways to calculate sentence vector similarity using NLP models

Sentence Similarity Calculator
This repo contains various ways to calculate the similarity between source and target sentences. You can choose the pre-trained models you want to use such as ELMo, BERT and Universal Sentence Encoder (USE).
And you can also choose the method to be used to get the similarity:
1. Cosine similarity
2. Manhattan distance
3. Euclidean distance
4. Angular distance
5. Inner product
6. TS-SS score
7. Pairwise-cosine similarity
8. Pairwise-cosine similarity + IDF
You can experiment with (The number of models) x (The number of methods) combinations!
Installation
- After cloning this repository, you can simply install all the dependent libraries described in
requirements.txt
withpip install -r requirements.txt
.
git clone https://github.com/Huffon/sentence-similarity.git
cd sentence-similarity
pip install -r requirements.txt
Usage
- To test your sentences, you should fill out
corpus.txt
with sentences as below.
I ate an apple.
I went to the Apple.
I ate an orange.
...
- Then, choose the model and method to be used to calculate the similarity between source and target sentences.
python sensim.py
--model MODEL_NAME
--method METHOD_NAME
--verbose LOG_OPTION (bool)
Examples
- In the following section, you can see the result of
sentence-similarity
. - As you guys know, there is a no silver-bullet which can calculate perfect similarity between sentences. You should conduct various experiments with your dataset.
- Caution:
TS-SS score
might not fit with short-sentence similarity task, since this method originally devised to calculate the similarity between documents.
- Caution:
- Result:
Requirements
- Python version should be higher than 3.6.x
- You should install PyTorch via official Installation guide
allennlp==0.9.0
bert-score==0.2.1
numpy==1.17.3
scikit-learn==0.21.3
scipy==1.3.1
seaborn==0.9.0
sentence-transformers==0.2.3
spacy==2.1.9
tensorflow==1.15.0
tensorflow-hub==0.7.0
torch==1.3.0
TODO
- Upgrade TF to TF2.0 to use
USE 3
- Add pairwise cosine similarity method in
use_elmo
. - Add
InferSent
,Sent2Vec
, plainGloVe
as models.