Team Hash Brown Science4Cast Submission

This code reproduces Team Hash Brown’s (@princengoc, @Xieyangxinyu) best submission (ee5a) for the competition

Our team came second with a score of 0.92738 (0.01 below the winner).

Authors: Ngoc Tran and Yangxinyu Xie


Easiest way is to clone the directory

git clone
cd s4s-final

Getting the data.

  1. Download data files from the competition’s organizers:
  2. Unzip and put the data files in the subfolder data/raw/

Run the following to install all the required packages.

pip install -r requirement.txt

Optional: rerun HOPREC Embedding

You can get new HOPREC embedding by running the following shell codes.

git submodule add
cd smore
cd ..
python --year 2017 --t_min 0.5 --t_max 0.9
python --year 2017 --t_min 0.5 --t_max 1

Since HOPREC is random, this may produce different embeddings, and therefore possibly different cosine scores than what we had.

Our particular HOPREC embedding is already included under data/HOPREC/2017_raw_count/

Scripts to check the differences in cosine similarities between two different HOPREC embeddings are in HOPREC/

Reproduce the submission

To reproduce the submission file, do

cd MLP\ code

This automatically creates a json file for submission (named after the current git commit hash). The json file and the MLP model parameters are saved under model_outputs. has the function reproducibility_check, which eats two json submissions and print out statistics on their agreements and differences. We used this to verify that what we submitted and what is produced by the code above agree in 99.999% out of 1 million entries of the test set.


View Github