This repo contains an example submission for the CSE447 project.
For this project, you will develop a program that takes in a string of character and tries to predict the next character.
For illustration purposes, this repo contains a dummy program that simply generates 3 random guesses of the next character.

Input format

example/input.txt contains an example of what the input to your program will look like.
Each line in this file correspond to a string, for which you must guess what the next character should be.

Output format

example/pred.txt contains an example of what the output of your program must look like.
Each line in this file correspond to guesses by the program of what the next character should be.
In other words, line i in example/pred.txt corresponds to what character the program thinks should come after the string in line i of example/pred.txt.
In this case, for each string, the program produces 3 guesses.

Implementing your program

src/myprogram.py contains the example program, along with a simple commandline interface that allows for training and testing.
This particular model only performs random guessing, hence the training step doesn’t do anything, and is only shown for illustration purposes.
During training, your may want to perform the following steps with your program:

  1. Load training data
  2. Train model
  3. Save trained model checkpoint

During testing, we will give your program new (e.g. heldout, unreleased) test data, which has the same format as (but different content from) example/input.txt.
In this case, your program must

  1. Load test data
  2. Load trained model checkpoint
  3. Predict answers for test data
  4. Save predictions

example/pred.txt contains an example predictions file that is generated by src/myprogram.py for example/input.txt.

Let’s walk through how to use the example program. First we will train the model, telling the program to save intermediate results in the directory work:

python src/myprogram.py train --work_dir work

Because this model doesn’t actually require training, we simply saved a fake checkpoint to work/model.checkpoint.
Next, we will generate predictions for the example data in example/input.txt and save it in pred.txt:

python src/myprogram.py test --work_dir work --test_data example/input.txt --test_output pred.txt

Evaluating your predictions

We will evaluate your predictions by checking it against a gold answer key.
The gold answer key for the example data can be found in example/answer.txt.
Essentially, your guess is correct if the correct next character is one of your guesses.
For simplicity, we will check caseless matches (e.g. it doesn’t matter if you guess uppercase or lowercase).

Let’s see what the random guessing program gets:

python grader/grade.py example/pred.txt example/answer.txt --verbose

You should see a detailed answer, as well as a success rate.

Submitting your project

To facilitate reproducibility, we will rely on containerization via Docker.
Essentially, we will use Docker to guarantee that we can run your program.
If you do not have Docker installed, please install it by following these instructions.
We will not be doing anything very fancy with Docker.
In fact, we provide you with a starter Dockerfile.
You should install any dependencies in this Dockerfile you require to perform prediction.

When you submit the project, you must submit a zip file submit.zip to Canvas that contains the following:

src  # source code directory for your program
work  # checkpoint directory for your program, which contains any intermediate result you require to perform prediction, such as model parameters
Dockerfile  # your Dockerfile
team.txt  # your team information
pred.txt  # your predictions for the example data `example/input.txt`

In team.txt, please put your names and NetIDs in the following format:


Your src directory must contain a predict.sh file, which, when executed as bash src/predict.sh <path_to_test_data> <path_to_predictions> must write predictions to the file <path_to_predictions>.
For reference, the predict.sh file for this example project is in src/predict.sh
Although we encourage you to follow conventions in the example project, it is not necessary to do so.
What is necessary is that predict.sh generates predictions according to spec.

On our end, we will extract this zip file and run the following command inside the unzipped directory.
You should make sure that this works on your end.
For example, you may consider going to a new directory, unzipping your submission zip file, and running the same command using the example data (e.g. set <path_to_test_data> to $PWD/example).

mkdir -p output
docker build -t cse447-proj/demo -f Dockerfile .
docker run --rm -v $PWD/src:/job/src -v $PWD/work:/job/work -v <path_to_test_data>:/job/data -v $PWD/output:/job/output cse447-proj/demo bash /job/src/predict.sh /job/data/input.txt /job/output/pred.txt

If you are curious what these flags in docker run mean:

  • --rm remove container after running
  • -v a:b mount a in host machine (e.g. path on your machine outside docker) to the container (e.g. path in the docker container).

Running this command will produce output/pred.txt, which we take to be your predictions on the heldout test data.
We will then evaluate your success rate against the heldout answer key using grader/grade.py.

Your performance will contain two metrics obtained on the heldout test data, the first being the success rate, the second being the run time.
Your run time is calculated as the time it takes us to run the docker run command.

For reference, the script submit.sh will package this example project for submission.
For convenience, we have put the command we will likely run to grade your assignment in grader/grade.sh.
You should be able to emulate the grading using the example data by running bash grader/grade.sh example, which will write your results to the output directory.


As more questions come in, we will create a FAQ here.
In the mean time, please post your questions to edStem with the tag Projects -> 447.

Can we have 2 or 4 instead of 3 members?

We strongly prefer that you have 3 members. If you would really like to have a group of 4, please instead do two groups of 2.

How do we sign up groups?

We will send you a google form link. Please have one member in the group fill out the form.

Will we get help in selecting datasets? Could we also get tips on finding good corpora?

Not at this time. Choosing and preprocessing your data is a key component of practising NLP in the real world and we would like you to experience it first hand.

Is there a max processing time limit?

Yes, tentatively 30 minutes to run the test data.

Is there a maximum size limit for the program?

Yes, tentatively 1 MB for src and 1 GB for your checkpoint

What does it mean the astronaut “speaks at least one human language”? Will system support mix language .. like mix of English + a different language?

Your program may receive input that is not English. The input may be a mix of English and other languages.

Can test sentences be virtually anything? e.g. they intentionally test robustness

Test sentences will not be adversarial. They will be reasonable dialogue utterances.

Do we have unlimited appendix for final report? Such as a 1 page report with a 10 page clarification?

No. There will be no appendix.


View Github