This paper introduces UniSAr, which extends existing autoregressive language models to incorporate three non-invasive extensions to make them structure-aware: (1) adding structure mark to encode database schema, conversation context, and their relationships; (2) constrained decoding to decode well structured SQL for a given database schema; and (3) SQL completion to complete potential missing JOIN relationships in SQL based on database schema.
Spider -> ./data/spider
Fine-tuned BART model -> ./models/spider_sl
(Please download this model by git-lfs
to avoid the issue.)
sudo apt-get install git-lfs
git lfs install
git clone https://huggingface.co/dreamerdeo/mark-bart
- Python version >= 3.6
- PyTorch version >= 1.5.0
pip install -r requirements.txt
- fairseq is going though changing without backward compatibility. Install
fairseq
from source and use this commit for reproducibilty. See here for the current PR that should fixfairseq/master
.
Step 1: Preprocess via adding schema-linking and value-linking tag.
python step1_schema_linking.py
Step 2: Building the input and output for BART.
python step2_serialization.py
Step 3: Evaluation Script with/without constrained decoding.
python step3_evaluate.py --constrain
Prediction: 69.34
Prediction with Constrain Decoding: 70.02
python interactive.py --logdir ./models/spider-sl --db_id student_1 --db-path ./data/spider/database --schema-path ./data/spider/tables.json
https://github.com/ryanzhumich/editsql
https://github.com/benbogin/spider-schema-gnn-global
https://github.com/ElementAI/duorat
https://github.com/facebookresearch/GENRE