Kaggle Google QA Labeling Competition

Here is solution of 41th place of the Kaggle Google QA Labeling competition

Install the package
pip install -e .
Download NER-Model from Kaggle
https://www.kaggle.com/alexeykarnachev/google-qa-ner
Prepare dataset
python scripts/prepare_dataset.py --seed=228 --train_df_file=data/google-quest-challenge/train.csv --test_df_file=data/google-quest-challenge/test.csv --tokenizer_cls=RobertaTokenizer --tokenizer_path=roberta-large --n_splits=7 --datasets_root=data/datasets/ --crop_strategies=both --dataset_cls=BiDataset --process_math --ner_model_dir=data/ner/code/bert_base_cased

--ner_model_dir is a path to downloaded NER model (from previous step)
Run experiment training
cd scripts
python run_encoder_experiment.py --config_path=../configs/base_config.yaml
Wait the training process end ...
Archive the experiment directory
cd experiments
tar zcvf <EXPERIMENT_DIR>.tar.gz <EXPERIMENT_DIR>
Send it to your kaggle datasets storage
Now, you can inference the model in a kernel
https://www.kaggle.com/alexeykarnachev/kernel1864bcfc13
For this, attach the following datasets to the kernel:
https://www.kaggle.com/alexeykarnachev/kaggle-google-qa-labeling (this package)
https://www.kaggle.com/alexeykarnachev/google-qa-ner (NER model)
https://www.kaggle.com/alexeykarnachev/transformersdependencies (transformers lib and dependencies)

Also, attach trained experiment to the kernel

Uncomment all lines in the kernel and replace the EXPERIMENT_NAME placeholder with your experiment name

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
configs		configs
data/google-quest-challenge		data/google-quest-challenge
kaggle_google_qa_labeling		kaggle_google_qa_labeling
models		models
notebooks		notebooks
scripts		scripts
.gitignore		.gitignore
README.md		README.md
setup.py		setup.py