This repository contains data and code for our EMNLP 2021 paper Models and Datasets for Cross-Lingual Summarisation. Please contact me at [email protected] for any question.

Please cite this paper if you use our code or data.

  author =      "Laura Perez-Beltrachini and Mirella Lapata",
  title =       "Models and Datasets for Cross-Lingual Summarisation",
  booktitle =   "Proceedings of The 2021 Conference on Empirical Methods in Natural Language Processing ",
  year =        "2021",
  address =     "Punta Cana, Dominican Republic",

The XWikis Corpus

You can create the corpus using the instructions below. The original XWikis corpus is available at XWikis.

Instructions to re-create our corpus and extract other languages are available here.

Cross-lingual Summarisation Code

Our code is based on Fairseq and mBART/mBART50. You’ll find our clone of Fairseq and the code extension to implement our models here and instructions to pre-process the data, and train and evaluate our models .

Models’ Outputs


View Github