Skip to content

alexeykarnachev/kaggle_google_qa_labeling

Repository files navigation

Kaggle Google QA Labeling Competition

Here is solution of 41th place of the Kaggle Google QA Labeling competition

  1. Install the package
    pip install -e .

  2. Download NER-Model from Kaggle
    https://www.kaggle.com/alexeykarnachev/google-qa-ner

  3. Prepare dataset
    python scripts/prepare_dataset.py --seed=228 --train_df_file=data/google-quest-challenge/train.csv --test_df_file=data/google-quest-challenge/test.csv --tokenizer_cls=RobertaTokenizer --tokenizer_path=roberta-large --n_splits=7 --datasets_root=data/datasets/ --crop_strategies=both --dataset_cls=BiDataset --process_math --ner_model_dir=data/ner/code/bert_base_cased

    --ner_model_dir is a path to downloaded NER model (from previous step)

  4. Run experiment training
    cd scripts
    python run_encoder_experiment.py --config_path=../configs/base_config.yaml

  5. Wait the training process end ...

  6. Archive the experiment directory
    cd experiments
    tar zcvf <EXPERIMENT_DIR>.tar.gz <EXPERIMENT_DIR>

  7. Send it to your kaggle datasets storage

  8. Now, you can inference the model in a kernel
    https://www.kaggle.com/alexeykarnachev/kernel1864bcfc13
    For this, attach the following datasets to the kernel:
    https://www.kaggle.com/alexeykarnachev/kaggle-google-qa-labeling (this package)
    https://www.kaggle.com/alexeykarnachev/google-qa-ner (NER model)
    https://www.kaggle.com/alexeykarnachev/transformersdependencies (transformers lib and dependencies)

    Also, attach trained experiment to the kernel

    Uncomment all lines in the kernel and replace the EXPERIMENT_NAME placeholder with your experiment name

About

41th place of the Kaggle Google QA Labeling competition

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published