This repo contains an example submission for the CSE447 project.
For this project, you will develop a program that takes in a string of character and tries to predict the next character.
For illustration purposes, this repo contains a dummy program that simply generates 3 random guesses of the next character.
example/input.txt contains an example of what the input to your program will look like.
Each line in this file correspond to a string, for which you must guess what the next character should be.
example/pred.txt contains an example of what the output of your program must look like.
Each line in this file correspond to guesses by the program of what the next character should be.
In other words, line
example/pred.txt corresponds to what character the program thinks should come after the string in line
In this case, for each string, the program produces 3 guesses.
Implementing your program
src/myprogram.py contains the example program, along with a simple commandline interface that allows for training and testing.
This particular model only performs random guessing, hence the training step doesn’t do anything, and is only shown for illustration purposes.
During training, your may want to perform the following steps with your program:
- Load training data
- Train model
- Save trained model checkpoint
During testing, we will give your program new (e.g. heldout, unreleased) test data, which has the same format as (but different content from)
In this case, your program must
- Load test data
- Load trained model checkpoint
- Predict answers for test data
- Save predictions
example/pred.txt contains an example predictions file that is generated by
Let’s walk through how to use the example program. First we will train the model, telling the program to save intermediate results in the directory
python src/myprogram.py train --work_dir work
Because this model doesn’t actually require training, we simply saved a fake checkpoint to
Next, we will generate predictions for the example data in
example/input.txt and save it in
python src/myprogram.py test --work_dir work --test_data example/input.txt --test_output pred.txt
Evaluating your predictions
We will evaluate your predictions by checking it against a gold answer key.
The gold answer key for the example data can be found in
Essentially, your guess is correct if the correct next character is one of your guesses.
For simplicity, we will check caseless matches (e.g. it doesn’t matter if you guess uppercase or lowercase).
Let’s see what the random guessing program gets:
python grader/grade.py example/pred.txt example/answer.txt --verbose
You should see a detailed answer, as well as a success rate.
Submitting your project
To facilitate reproducibility, we will rely on containerization via Docker.
Essentially, we will use Docker to guarantee that we can run your program.
If you do not have Docker installed, please install it by following these instructions.
We will not be doing anything very fancy with Docker.
In fact, we provide you with a starter
You should install any dependencies in this Dockerfile you require to perform prediction.
When you submit the project, you must submit a zip file
submit.zip to Canvas that contains the following:
src # source code directory for your program work # checkpoint directory for your program, which contains any intermediate result you require to perform prediction, such as model parameters Dockerfile # your Dockerfile team.txt # your team information pred.txt # your predictions for the example data `example/input.txt`
team.txt, please put your names and NetIDs in the following format:
Name1,NetID1 Name2,NetID2 Name3,NetID3
src directory must contain a
predict.sh file, which, when executed as
bash src/predict.sh <path_to_test_data> <path_to_predictions> must write predictions to the file
For reference, the
predict.sh file for this example project is in
Although we encourage you to follow conventions in the example project, it is not necessary to do so.
What is necessary is that
predict.sh generates predictions according to spec.
On our end, we will extract this zip file and run the following command inside the unzipped directory.
You should make sure that this works on your end.
For example, you may consider going to a new directory, unzipping your submission zip file, and running the same command using the example data (e.g. set
mkdir -p output docker build -t cse447-proj/demo -f Dockerfile . docker run --rm -v $PWD/src:/job/src -v $PWD/work:/job/work -v <path_to_test_data>:/job/data -v $PWD/output:/job/output cse447-proj/demo bash /job/src/predict.sh /job/data/input.txt /job/output/pred.txt
If you are curious what these flags in
docker run mean:
--rmremove container after running
ain host machine (e.g. path on your machine outside docker) to the container (e.g. path in the docker container).
Running this command will produce
output/pred.txt, which we take to be your predictions on the heldout test data.
We will then evaluate your success rate against the heldout answer key using
Your performance will contain two metrics obtained on the heldout test data, the first being the success rate, the second being the run time.
Your run time is calculated as the time it takes us to run the
docker run command.
For reference, the script
submit.sh will package this example project for submission.
For convenience, we have put the command we will likely run to grade your assignment in
You should be able to emulate the grading using the example data by running
bash grader/grade.sh example, which will write your results to the
As more questions come in, we will create a FAQ here.
In the mean time, please post your questions to edStem with the tag
Projects -> 447.
Can we have 2 or 4 instead of 3 members?
We strongly prefer that you have 3 members. If you would really like to have a group of 4, please instead do two groups of 2.
How do we sign up groups?
We will send you a google form link. Please have one member in the group fill out the form.
Will we get help in selecting datasets? Could we also get tips on finding good corpora?
Not at this time. Choosing and preprocessing your data is a key component of practising NLP in the real world and we would like you to experience it first hand.
Is there a max processing time limit?
Yes, tentatively 30 minutes to run the test data.
Is there a maximum size limit for the program?
Yes, tentatively 1 MB for
srcand 1 GB for your checkpoint
What does it mean the astronaut “speaks at least one human language”? Will system support mix language .. like mix of English + a different language?
Your program may receive input that is not English. The input may be a mix of English and other languages.
Can test sentences be virtually anything? e.g. they intentionally test robustness
Test sentences will not be adversarial. They will be reasonable dialogue utterances.
Do we have unlimited appendix for final report? Such as a 1 page report with a 10 page clarification?
No. There will be no appendix.