Skip to content

Commit

Permalink
Merge pull request #52 from krai/more_info_detokenised
Browse files Browse the repository at this point in the history
Detokenised log now preserves qsl_idx, seq_id and token_count fields
  • Loading branch information
Akshat-Tripathi committed Aug 6, 2024
2 parents cccfc97 + 8037235 commit a0f7cb6
Showing 1 changed file with 6 additions and 1 deletion.
7 changes: 6 additions & 1 deletion base_llama2_loadgen_experiment/code_axs.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,12 @@ def detokenise(
tokens = [
int.from_bytes(bytes.fromhex(tok), byteorder="little") for tok in hex_tokens
]
output_log.append(tokeniser.decode(tokens))
output_log.append({
"seq_id" : item["seq_id"],
"qsl_idx" : item["qsl_idx"],
"data": tokeniser.decode(tokens),
"token_count" : item["token_count"]
})

with open(output_log_path, "w") as f:
json.dump(output_log, f, indent=2)
Expand Down

0 comments on commit a0f7cb6

Please sign in to comment.