EZswitch is a novel framework designed to generate code-switched text, blending two languages within a single sentence or discourse. This tool incorporates Equivalence Constraint Theory (ECT) with and large language models (LLMs) to produce syntactically valid, natural-sounding code-switched sentences. This repo is based on this paper
EZSwitch requires Python 3.10 or greater. To install needed library you can use:
pip install -r requirements.txt
We use GIZA to get word alignment between parallel sentence. And we need to install this, We have included the repo in the submodules.
To install Giza-py, clone the repo and install pip dependencies:
cd alignment/giza-py
pip install -r requirements.txt
In order to install MGIZA++ on Linux/macOS, follow these steps:
- Download the Boost C++ library and unzip it.
- Build Boost:
cd <boost_dir>
./bootstrap.sh --prefix=./build --with-libraries=thread,system
./b2 install
- Build MGIZA++ (CMake is required):
cd alignment/mgizapp
cmake -DBOOST_ROOT=<boost_dir>/build -DBoost_USE_STATIC_LIBS=ON -DCMAKE_INSTALL_PREFIX=<giza-py_dir>/.bin .
make
make install
@misc{kuwanto2024linguisticstheorymeetsllm,
title={Linguistics Theory Meets LLM: Code-Switched Text Generation via Equivalence Constrained Large Language Models},
author={Garry Kuwanto and Chaitanya Agarwal and Genta Indra Winata and Derry Tanti Wijaya},
year={2024},
eprint={2410.22660},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2410.22660},
}