Skip to content

OSIRRC Docker Image for Neural Vector Space Model (NVSM)

License

Notifications You must be signed in to change notification settings

osirrc/nvsm-docker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OSIRRC Docker Image for NVSM

Generic badge DOI

Nicola Ferro, Stefano Marchesin, Alberto Purpura and Gianmaria Silvello

This is the docker image of our implementation of Neural Vector Space Model (NVSM) conforming to the OSIRRC jig for the Open-Source IR Replicability Challenge (OSIRRC) at SIGIR 2019. This image is available on Docker Hub has been tested with the jig at commit ca31987 (6/5/2019).

  • Supported test collections: robust04
  • Supported hooks: init, index, train, search

Quick Start

The following jig command can be used to index TREC disks 4/5 for robust04:

python run.py prepare \
  --repo albep/nvsm \
  --collections robust04=/path/to/disk45=trectext

The following jig command can be used to train the retrieval model on the robust04 collection:

python run.py train \
  --repo albep/nvsm \
  --model_folder path/model/directory \
  --topic topics/topics.robust04.txt \
  --test_split sample_training_validation_query_ids/robust04_test.txt \
  --validation_split sample_training_validation_query_ids/robust04_validation.txt \
  --qrels qrels/qrels.robust04.txt \
  --opts epochs=12 \
  --collection Robust04

The following jig command can be used to perform a retrieval run on the collection with the robust04 test collection.

python run.py search \
  --repo albep/nvsm \
  --output path/model/directory \
  --qrels qrels/qrels.robust04.txt \
  --topic topics/topics.robust04.txt \
  --test_split sample_training_validation_query_ids/robust04_test.txt \
  --collection robust04

Expected Results

robust04

MAP NVSM CPU NVSM GPU
Robust04 test split topics 0.138 0.138*

* Results with the NVSM GPU image may slightly vary. In fact, TensorFlow uses the Eigen library, which uses Cuda atomics to implement reduction operations, such as tf.reduce_sum etc. Those operations are non-deterministical and each operation can introduce small variations. See this Tensorflow issue for more details.

Notes

The paths path/to/model/directory, passed to the train script, and path/model/directory, passed to the search one, need to point to the same directory.

nvsm_gpu requires nvidia-docker (https://github.com/NVIDIA/nvidia-docker) installed on the host machine.