State of the art Semantic Sentence Embeddings.
Contrastive Tension(CT) is a fully self-supervised algorithm for re-tuning already pre-trained transformer Language Models, and achieves State-Of-The-Art(SOTA) sentence embeddings for Semantic Textual Similarity(STS). All that is required is hence a pre-trained model and a modestly large text corpus. The results presented in the paper sampled text data from Wikipedia.
This repository contains:
- Tensorflow 2 implementation of the CT algorithm
- State of the art pre-trained STS models
- Tensorflow 2 inference code
- PyTorch inference code
While it is possible that other versions works equally fine, we have worked with the following:
- Python = 3.6.9
- Transformers = 4.1.1
All the models and tokenizers are available via the Huggingface interface, and can be loaded for both Tensorflow and PyTorch:
import transformers tokenizer = transformers.AutoTokenizer.from_pretrained('Contrastive-Tension/RoBerta-Large-CT-STSb') TF_model = transformers.TFAutoModel.from_pretrained('Contrastive-Tension/RoBerta-Large-CT-STSb') PT_model = transformers.AutoModel.from_pretrained('Contrastive-Tension/RoBerta-Large-CT-STSb')
To perform inference with the pre-trained models (or other Huggigface models) please see the script ExampleBatchInference.py.
The most important thing to remember when running inference is to apply the attention_masks on the batch output vector before mean pooling, as is done in the example script.
To run CT on your own models and text data see ExampleTraining.py for a comprehensive example. This file currently creates a dummy corpus of random text. Simply replace this to whatever corpus you like.
Note that these models are not trained with the exact hyperparameters as those disclosed in the original CT paper. Rather, the parameters are from a short follow-up paper currently under review, which once again pushes the SOTA.
All evaluation is done using the SentEval framework, and shows the: (Pearson / Spearman) correlations
Unsupervised / Zero-Shot
As both the training of BERT, and CT itself is fully self-supervised, the models only tuned with CT require no labeled data whatsoever.
The NLI models however, are first fine-tuned towards a natural language inference task, which requires labeled data.
|Model||Avg Unsupervised STS||STS-b||#Parameters|
|BERT-Distil-CT||75.12 / 75.04||78.63 / 77.91||66 M|
|BERT-Base-CT||73.55 / 73.36||75.49 / 73.31||108 M|
|BERT-Large-CT||77.12 / 76.93||80.75 / 79.82||334 M|
|Using NLI Data|
|BERT-Distil-NLI-CT||76.65 / 76.63||79.74 / 81.01||66 M|
|BERT-Base-NLI-CT||76.05 / 76.28||79.98 / 81.47||108 M|
|BERT-Large-NLI-CT||77.42 / 77.41||80.92 / 81.66||334 M|
These models are fine-tuned directly with STS data, using a modified version of the supervised training object proposed by S-BERT.
To our knowledge our RoBerta-Large-STSb is the current SOTA model for STS via sentence embeddings.
|BERT-Distil-CT-STSb||84.85 / 85.46||66 M|
|BERT-Base-CT-STSb||85.31 / 85.76||108 M|
|BERT-Large-CT-STSb||85.86 / 86.47||334 M|
|RoBerta-Large-CT-STSb||87.56 / 88.42||334 M|
Distributed under the MIT License. See
LICENSE for more information.
If you have questions regarding the paper, please consider creating a comment via the official OpenReview submission.
If you have questions regarding the code or otherwise related to this Github page, please open an issue.
For other purposes, feel free to contact me directly at: [email protected]