Skip to content

Commit

Permalink
Remove Unnecessary handling of special characters
Browse files Browse the repository at this point in the history
  • Loading branch information
mokeddembillel committed Dec 18, 2024
1 parent 92e41ec commit 8f481d9
Showing 1 changed file with 0 additions and 9 deletions.
9 changes: 0 additions & 9 deletions convert_hf_to_gguf.py
Original file line number Diff line number Diff line change
Expand Up @@ -525,15 +525,6 @@ def get_vocab_base(self) -> tuple[list[str], list[int], str]:
else:
token: str = reverse_vocab[i]
if token in added_vocab:
# We need to manually encode and decode the added tokens in case special characters
# used for `\n` / `\t` have been manually added in the added tokens
# To avoid unexpected issues - we make sure to encode single-char tokens
if len(token) == 1:
previous_token = token
token = tokenizer.decode(tokenizer.encode(token, add_special_tokens=False))
if previous_token != token:
logger.info(f"{repr(previous_token)} is encoded and decoded back to {repr(token)} using AutoTokenizer")

if tokenizer.added_tokens_decoder[i].special or self.does_token_look_special(token):
toktypes.append(gguf.TokenType.CONTROL)
else:
Expand Down

0 comments on commit 8f481d9

Please sign in to comment.