Skip to content
This repository was archived by the owner on May 1, 2025. It is now read-only.
This repository was archived by the owner on May 1, 2025. It is now read-only.

How to convert Tokens to ids correctly #140

@RichardMLuu

Description

@RichardMLuu

我使用self.tokenizer.convert_tokens_to_ids尝试将模型输出的text_feat转化为input_id进而转化为文本,代码如下所示:

text_output = self.text_encoder.bert(text.input_ids, attention_mask=text.attention_mask,
                                             return_dict=True, mode='text')

text_embeds = text_output.last_hidden_state
text_feat = F.normalize(self.text_proj(text_embeds[:, 0, :]), dim=-1)

input_ids = self.tokenizer.convert_tokens_to_ids(text_feat[0])
# 转换 `input_id` 为文本
decoded_text = self.tokenizer.decode(input_ids, skip_special_tokens=True)
print('decoded_text', decoded_text)

但是输出结果一直有误,要么全部得到[PAD],或者得到[100, 100]。我检查了Token的值,发现他们并不一样,我觉得是我代码出了问题,我想知道正确的做法应该是什么。

image

image

image

----translation-----
I use self.tokenizer.convert_tokens_to_ids to try to convert the text_feat output of the model to input_id and then to text, as follows:

But the output keeps getting wrong, either all [PAD] or all [100, 100]. I checked the value of the Token and found that they were not the same, I felt that there was something wrong with my code and I wanted to know what the right thing to do should be.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions