This repository contains the source code of our ICML 2021 paper How could Neural Networks understand Programs?.
Run following commands to build a docker image for the environment:
cd docker sudo docker build -t oscar:latest .
And you can launch a container with
sudo nvidia-docker run -it --mount type=bind,source="$(pwd)",target=/oscar oscar:latest
To compile the binaries for processing the data:
cd /oscar/bin make
Then the OSCAR LLVM analyzer pass (located in
analyzer), IR Lexer (located in
irlexer), and FastBPE (located in fastBPE) will be compiled.
Processing the data
First, please visit https://1drv.ms/u/s!AjYwgux2zLgMiAhYpoCU3jLu20Z6?e=XR52y9 to download the data for pretraining and downstream tasks. Extract the downloaded tarballs to the
To process the data for pretraining and the downstream tasks, enter the coressponding directories and execute
./process.sh. Raw data needs to be placed in the directory
data-raw. Processed data will be placed in the directory
Train the model
Use following commands to pretrain the model:
cd /oscar/model ./scripts/pretrain.sh
For downstream tasks the procedure is similar.