python 3.11, virtualenv
You can use pyenv to setup Python 3.11
pyenv local 3.11On the Alliance clusters you can run the following command to load the Python3.11:
module load StdEnv/2023python3.11 -m venv .venv
source .venv/bin/activate
pip3 install -r requirements.txtDownload this zip file
and extract it to the data directory:
wget -O data.zip "https://www.dropbox.com/scl/fi/vtraf79vfi1x105veaflk/data.zip?rlkey=7yq6d46aer6h45pdihrc9rht1&st=zdac3rqx&dl=0"
unzip data.zipYour data directory should look like this:
data/
├── databases/
├── 1_input.json
.
.
.cp .env.example .envThe only required variable to set is OPENAI_API_KEY.
By default, we are using OpenRouter, so you need to set the api key
for OpenRouter.
You may also change the LIMIT variable to modify the number of entries to be read from the dataset.
START specifies the start index for reading from the dataset.
For instance, set LIMIT=10 to run the pipeline for a dataset of size 10.
SLM_MODEL and LLM_MODEL specify the ID of small/large language models to be used in the pipeline.
These IDs should be set based on the LM provider being used.
For instance, since we are using OpenRouter, model identifiers should be specified accordingly, e.g.,
openai/gpt-4.1 for GPT-4.1.
To run MaskSQL, first we need to filter the schema items
using RESDSQL.
Follow these instructions to run the RESDSQL
and generated the file needed for the MaskSQL pipeline.
Then, you need to run the MaskSQL with the --resd option.
To run the MaskSQL, first you need to activate the venv and set the environment variables:
source .venv/bin/activate
export $(cat .env | xargs)
export PYTHONPATH=.Then you can run MaskSQL pipline as follows:
python main.py --resdMaskSQL saves the intermediate results to files for later user. So, in order to run the pipeline from scratch you need to clean the data directory:
./clean.sh data