Boosting Zero-Shot Crosslingual Performance using LLM-Based Augmentations with Effective Data Selection

This repo can be used to reproduce results of the paper "Boosting Zero-Shot Crosslingual Performance using LLM-Based Augmentations with Effective Data Selection"

Data

Data related to all tasks need to stored in the data/ directory of the main repository. Datasets generated for all tasks are present here: https://drive.google.com/file/d/1t2aQAeKQMDK5GP8kg1oMtsZZDs11dpdz/view?usp=sharing

LLAMA-2 generations and finetuning scripts

Fine tuning code of llama-2 is present in the directory train_llama2. Also, the code to generate data for the different tasks using LLAMA-2 is present in this directory.

Translation

IndicTrans2 can be installed by following this: https://github.com/AI4Bharat/IndicTrans2#installation
Clone this repository in the home directory and create a separate conda environment to translate using IndicTrans2.
Download the en-indic model from https://indictrans2-public.objectstore.e2enetworks.net/it2_preprint_ckpts/en-indic-preprint.zip and store the weights in IndicTrans2/translations/en-indic-preprint folder.

For any queries regarding the code base, reach out to: [email protected]

If you use this repo, please cite the paper.

Name		Name	Last commit message	Last commit date
Latest commit History 108 Commits
IndicTrans2		IndicTrans2
code-switch/our_model		code-switch/our_model
multi-task-training		multi-task-training
pre-training		pre-training
self-training		self-training
train_llama2		train_llama2
train_xlmr		train_xlmr
utils		utils
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Boosting Zero-Shot Crosslingual Performance using LLM-Based Augmentations with Effective Data Selection

Data

LLAMA-2 generations and finetuning scripts

Translation

About

Uh oh!

Releases

Packages

Languages

csalt-research/LLM-Based-Augmentations-with-Effective-Data-Selection

Folders and files

Latest commit

History

Repository files navigation

Boosting Zero-Shot Crosslingual Performance using LLM-Based Augmentations with Effective Data Selection

Data

LLAMA-2 generations and finetuning scripts

Translation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages