How to convert Tokens to ids correctly

我使用`self.tokenizer.convert_tokens_to_ids`尝试将模型输出的text_feat转化为input_id进而转化为文本，代码如下所示：
```
text_output = self.text_encoder.bert(text.input_ids, attention_mask=text.attention_mask,
                                             return_dict=True, mode='text')

text_embeds = text_output.last_hidden_state
text_feat = F.normalize(self.text_proj(text_embeds[:, 0, :]), dim=-1)

input_ids = self.tokenizer.convert_tokens_to_ids(text_feat[0])
# 转换 `input_id` 为文本
decoded_text = self.tokenizer.decode(input_ids, skip_special_tokens=True)
print('decoded_text', decoded_text)
```
但是输出结果一直有误，要么全部得到[PAD]，或者得到[100, 100]。我检查了Token的值，发现他们并不一样，我觉得是我代码出了问题，我想知道正确的做法应该是什么。

![image](https://github.com/salesforce/ALBEF/assets/103711400/732d85c9-1383-4e96-97a6-ef832faaaa30)

![image](https://github.com/salesforce/ALBEF/assets/103711400/c655adf4-7c1c-42de-a7b8-46ad8f2886c7)

![image](https://github.com/salesforce/ALBEF/assets/103711400/75db1c56-80f6-4749-bc05-a6a44e11c17f)



----translation-----
I use `self.tokenizer.convert_tokens_to_ids` to try to convert the text_feat output of the model to input_id and then to text, as follows:

But the output keeps getting wrong, either all [PAD] or all [100, 100]. I checked the value of the Token and found that they were not the same, I felt that there was something wrong with my code and I wanted to know what the right thing to do should be.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to convert Tokens to ids correctly #140

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to convert Tokens to ids correctly #140

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions