Skip to content

Commit

Permalink
fix: do padding ourself
Browse files Browse the repository at this point in the history
  • Loading branch information
samsja committed Sep 25, 2024
1 parent c0eb41a commit a90d07e
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion src/zeroband/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ def get_dataloader(
ds = load_dataset("allenai/c4", "en", streaming=True)

def tokenize_function(data):
outputs = tokenizer(data["text"], truncation=True, max_length=seq_length, padding="max_length")
outputs = tokenizer(data["text"], truncation=True, max_length=seq_length)
return outputs

tokenized_datasets = ds.map(
Expand Down

0 comments on commit a90d07e

Please sign in to comment.