On the issue of models requiring truncation

Certain models that requires truncation will fail to generate text.

The following dataset uses the following example text:

"""
is_original_content: None
over_18: None
post: comment
subreddit: Genshin_Impact
prompt: What do you think about Genshin Impact?
response: I think its great. It's a fun and addicting game that can be played anywhere. I personally like how...
"""

When the text suprasses the max token length of the model, removing the example or truncation must be done to train the dataset without error. Removing the example pollutes our data as the generated text will no longer be representative of an person's pattern of text, only a shorter version of themselves. Truncation can potentially remove the tags("response", "prompt", etc) encouraging the text to not follow the example format and render it unparsable. 

For now, I will attempt to truncate the examples.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

On the issue of models requiring truncation #7

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

On the issue of models requiring truncation #7

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions