For this project, you will train a network to generate captions for the VizWiz Image Captioning dataset. The images are taken by people who are blind and typically rely on human-based image captioning services. Your objective will be to beat a a baseline score on the test set leaderboard.
Clone this repo to your directory on the SCC DS598 project space, e.g.
/projectnb/ds598/students/<userid>
.
Once you have a training script setup, create a shell script, e.g. train.sh
,
that loads and activates a conda environment and then runs your training
script. An example shell script is below.
#!/bin/bash -l
# Set SCC project
#$ -P ds598
# load and activate the academic-ml conda environment on SCC
module load miniconda
module load academic-ml/spring-2024
conda activate spring-2024-pyt
# Add the path to your source project directory to the python search path
# so that the local `import` commands will work.
export PYTHONPATH="/projectnb/ds598/students/<userid>/<yourdir>:$PYTHONPATH"
# Update this path to point to your training file
python path/to/train.py
# After updating the two paths above, run the command below from an SCC
# command prompt in the same directory as this file to submit this as a
# batch job.
### qsub -pe omp 4 -P ds598 -l gpus=1 train.sh
Note that there are train and test scripts for the two folders already.
When you run the example scripts, make sure to add the path to the repo folder before running the script.
export PYTHONPATH="/projectnb/ds598/path/to/folder:$PYTHONPATH"
The example shell scripts include this command.
Set the paths in src/base/constants.py
to the correct paths on your system.
Follow the .sh files to run the code. As an example, to run the cnnlstm_train.sh
script, you would run at the command prompt from the base of your local repo
folder:
$ qsub -pe omp 4 -P ds598 -l gpus=1 cnnlstm_train.sh
Your job 5437870 ("cnnlstm_train.sh") has been submitted
As shown, you should get notification that your job was submitted and get a job ID number.
You can check your job status by typing:
$ qstat -u <userid>
ob-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
5437870 0.00000 cnnlstm_tr tgardos qw 03/14/2024 09:40:24
The above is showing the example output from user tgardos
.
The dataset is downloaded to
/projectnb/ds598/materials/datasets/vizwiz/captions
. There is no need to
download the dataset again and the path has already been defined in the
accompanying code.
In the VizWiz challenge evaluation they refer to five different evaluation metrics although they use CIDr-D as their primary evaluation.
They reference the BLUE metric, but there are limitations to that metric as described in [2] below.
Validation set results are reported in the CNN-LSTM example and code for reporting validation results are in the demo model code.
As is typically the case, the test dataset labels are withheld, and so the only way to get test results is to produce predicted captions and then submit them to the VizWiz Image Captioning Evaluation Server. There are scripts in both model directories to create the test submission file, although the demo model test script will have to be updated with model information.
Create an account on the Evaluation Server and submit your test predictions to get your result.
Step-by-step instructions will be added here shortly.
State-of-the-art CIDEr-D scores on VizWiz Image Captioning is ~125. We're asking that you get a minimum CIDEr-D test score of 50.
- CIDEr: Consensus-based image description evaluation
- BLEU: A Misunderstood Metric from Another Age, Medium Post
- BLEU Metric, HuggingFace space