Skip to content

Commit

Permalink
changed tokenizers to tokenizer_options due to naming conflict
Browse files Browse the repository at this point in the history
  • Loading branch information
mmoffatt2 committed Dec 9, 2024
1 parent 82ebac1 commit 20656cb
Show file tree
Hide file tree
Showing 4 changed files with 4 additions and 349 deletions.
2 changes: 2 additions & 0 deletions data/create_new_dataset.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ pushd "$new_dataset"
# Use softlinks so we can use template/prepare.py for development
ln -s ../template/prepare.py prepare.py
ln -s ../template/utils ./utils
ln -s ../template/tests.py tests.py
ln -s ../template/tokenizer_options.py tokenizer_options.py

# Different datasets may have different phoneme sets
cp ../template/get_dataset.sh get_dataset.sh
Expand Down
2 changes: 1 addition & 1 deletion data/template/prepare.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
import pickle
import argparse
import numpy as np
from tokenizers import (
from tokenizer_options import (
NumericRangeTokenizer,
SentencePieceTokenizer,
TiktokenTokenizer,
Expand Down
2 changes: 1 addition & 1 deletion data/template/tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import unittest
import os
import sys # Import sys to exit with error codes
from tokenizers import (
from tokenizer_options import (
NumericRangeTokenizer,
SentencePieceTokenizer,
TiktokenTokenizer,
Expand Down
347 changes: 0 additions & 347 deletions data/template/tokenizers.py

This file was deleted.

0 comments on commit 20656cb

Please sign in to comment.