This is the code repository for the paper: "What is Learned in Visually Grounded Neural Syntax Acquisition", Noriyuki Kojima, Hadar Averbuch-Elor, Alexander Rush and Yoav Artzi (ACL 2020, Short Paper).
Visual features are a promising signal for learning bootstrap textual models. However, blackbox learning models make it difficult to isolate the specific contribution of visual components. In this analysis, we consider the case study of the Visually Grounded Neural Syntax Learner (Shi et al., 2019), a recent approach for learning syntax from a visual training signal. By constructing simplified versions of the model, we isolate the core factors that yield the model’s strong performance. Contrary to what the model might be capable of learning, we find significantly less expressive versions produce similar predictions and perform just as well, or even better. We also find that a simple lexical signal of noun concreteness plays the main role in the model’s predictions as opposed to more complex syntactic reasoning.
- Requirement: software
- Requirement: data
- Test trained models
- Train your own models
Python Virtual Env Setup: All code is implemented in Python. We recommend using virtual environment for installing these python packages.
VERT_ENV=vgnsl_analysis
# With virtualenv
pip install virtualenv
virtualenv $VERT_ENV
source $VERT_ENV/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
# With Anaconda virtual environment
conda update --all
conda create --name $VERT_ENV python=3.5
conda activate $VERT_ENV
pip install --upgrade pip
pip install -r requirements.txt
Follow the instruction in https://github.com/ExplorerFreda/VGNSL (Data Preparation
section) to download all the mscoco
data under data/mscoco
directory.
Please refer outputs/README.md
to download trained models
cd src
# calculate F1 score
python test.py --candidate path_to_checkpoint --splits test
# calculate F1 score and output prediction to a text file
python test.py --candidate path_to_checkpoint --splits test --record_trees
Please download category annotation from the link and put them under data/mscoco
.
# calculate F1 score and catefory-wise recalls
python test.py --candidate path_to_checkpoint --splits test --ctg_eval
# train 1D embeddings with WS score function and Mean combine function
python train.py --log_step 20 --bottleneck_dim 1 --logger_name ../outputs/1-ws-mean --score_fn ws --combine_fn mean
# train 2D embeddings with WS score function and Mean combine function (+HI)
python train.py --log_step 20 --bottleneck_dim 2 --logger_name ../outputs/2-ws-mean --score_fn ws --combine_fn mean --lambda_hi 20
# train 2D embeddings with WS score function and Mean combine function (+HI+FastText)
python train.py --log_step 20 --bottleneck_dim 2 --logger_name ../outputs/hi-fasttext-2-ws-mean --score_fn ws --combine_fn mean --lambda_hi 20 --init_embeddings_key fasttext --init_embeddings_type partial-fixed
# train 1D embeddings with Mean Hi score function and Max combine function (+HI+FastText-IN)
python train.py --log_step 20 --bottleneck_dim 1 --logger_name ../outputs/hi-fasttext-noimgnorm-1-meanhi-max --score_fn mean_hi --combine_fn max --lambda_hi 20 --init_embeddings_key fasttext --no_imgnorm
MIT
If you find this codebase and models useful in your research, please consider citing the following paper:
@InProceedings{Kojima2020:vgnsl,
title = "What is Learned in Visually Grounded Neural Syntax Acquisition",
author = "Noriyuki Kojima and Hadar Averbuch-Elor and Alexander Rush and Yoav Artzi",
booktitle = "Proceedings of the Annual Meeting of the Association for Computational Linguistics",
month = "July",
year = "2020",
publisher = "Association for Computational Linguistics",
}
We would like to thank Freda for making their code (the code in this repo is largely borrowed from the original VGNSL implementation) public and responding promptly to our inquiry on Visually Grounded Neural Syntax Acquisition (Shi et al., ACL2019).