AIO2 TF-IDF Baseline

This is a very simple question answering system, which is developed as a lightweight baseline for AIO2 competition.

In the training stage, the model builds a sparse matrix of TF-IDF features from the questions in training dataset. In the inference stage, the model predicts answers of unseen questions by finding the most similar training question to the input by computing dot product scores of TF-IDF features.

Therefore, in principle, the model cannot predict answers unseen in the training data.

Steps to experiment with the model

Install requirements

$ pip install -r requirements.txt

Train

$ python train.py \
--train_file <data dir>/aio_02_train.jsonl \
--output_dir model \
--pos_list 名詞 \
--stop_words でしょ う \
--max_features 10000

Predict

$ python predict.py \
--model_dir model \
--test_file <data dir>/aio_02_dev_unlabeled_v1.0.jsonl \
--prediction_file <output dir>/predictions.jsonl

Building Docker image

$ docker build -t aio2-tfidf-baseline .

Test locally:

$ docker run --rm -v "<data dir absolute path>:/app/input" -v "<output dir absolute path>:/app/output" aio2-tfidf-baseline bash ./submission.sh input/aio_02_dev_unlabeled_v1.0.jsonl output/predictions.jsonl

Save the docker image to file:

$ docker save aio2-tfidf-baseline | gzip > aio2-tfidf-baseline.tar.gz

License

The codes in this repository are open-sourced under MIT License.

TFIDF-based QA system for AIO2 competition

AIO2 TF-IDF Baseline

Steps to experiment with the model

Install requirements

Train

Predict

Building Docker image

License

John

Userscript qutebrowser for downloading audio / video from youtube using aria2

Automatically draw a KiCad schematic for a circuit prototyped on a breadboard

AIO2 TF-IDF Baseline

Steps to experiment with the model

Install requirements

Train

Predict

Building Docker image

License

Userscript qutebrowser for downloading audio / video from youtube using aria2

Automatically draw a KiCad schematic for a circuit prototyped on a breadboard

You might also like...