This repository contains the data and code in the following paper:
Towards Robustness of Text-to-SQL Models Against Natural and Realistic Adversarial Table Perturbation
Xinyu Pi*, Bing Wang*, Yan Gao, Jiaqi Guo, Zhoujun Li, Jian-Guang Lou
ACL 2022 Long Papers
This repository is the official implementation of our paper Towards Robustness of Text-to-SQL Models Against Natural and Realistic Adversarial Table Perturbation. In this paper, we curate ADVETA, the first robustness evaluation benchmark featuring natural and realistic adversarial table perturbation. To defend against this perturbation, we build a systematic adversarial training example generation framework CTA, tailored for better contextualization of tabular data.
We manually curate the ADVErsarial Table perturbAtion
(ADVETA) benchmark based on three mainstream Text-to-SQL datasets, Spider, WikiSQL and WTQ.
For each table from the original development set, we conduct RPL/ADD annotation separately, perturbing only table columns. We release our data in adveta_1.0.zip
file.
- python: 3.8
- cuda: 10.1
- torch: 1.7.1
install dependencies:
conda create -n cta python=3.8 -y
conda activate cta
conda install pytorch==1.7.1 cudatoolkit=10.1 -c pytorch -y
python -m spacy download en_core_web_sm
pip install -r requirements.txt
Contextualized Table Augmentation (CTA) framework as an adversarial training example generation approach tailored for tabular data. Before you run pipeline.ipynb
, you should download data files and checkpoints from Google Drive.
notes:
- We download number-batch word embedding from here as
./data/nb_emb.txt
. - We pre-compute processed-WDC tables using Tapas dense retrieval models. Store output to
./wdc/wdc_dense_A.txt
and./wdc/wdc_dense_B.txt
(Tapas have two encoders).
Just run the pipeline.ipynb
and have fun.
@inproceedings{pi-etal-2022-towards,
title = "Towards Robustness of Text-to-{SQL} Models Against Natural and Realistic Adversarial Table Perturbation",
author = "Pi, Xinyu and Wang, Bing and Gao, Yan and Guo, Jiaqi and Li, Zhoujun and Lou, Jian-Guang",
month = may,
year = "2022",
address = "Dublin, Ireland",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.acl-long.142",
pages = "2007--2022"
}