Skip to content

Source code for CoNLL 2021 paper by Huebner et al. 2021

Notifications You must be signed in to change notification settings

upunaprosk/BabyBERTa

 
 

Repository files navigation

BabyLM: Training BPE Tokenizer

This repository contains code for training a tokenizer on BabyLM 10M corpus. To train a tokenizer, clone this repository, install the requirements and run the following command:

python scripts/train_bbpe.py

The code is based on BabyBERTa.

About

Source code for CoNLL 2021 paper by Huebner et al. 2021

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%