Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some episodes are failing because the extraction is truncated #2

Open
xpilasneo4j opened this issue Nov 20, 2023 · 2 comments
Open
Assignees

Comments

@xpilasneo4j
Copy link
Contributor

Lost of episodes fail on Theme or Episode because the _extraction is truncated and finished by CLEAN CLEAN CLEAN instead of the expected output "Answer:" or "- ..."

Output of _extraction
"And that's what I like to do in that could not have been scripted better. You actually got back catch, well done. Loyalty. That is all we have time for today on access all areas. We'll be back again next CLEAN CLEAN CLEAN"

@xpilasneo4j
Copy link
Contributor Author

Seems like a size issue: the prompt+transcript is too big: it I truncate the transcript to 21k characters, it fails less

@xpilasneo4j
Copy link
Contributor Author

I added a loop to reduce by 500 characters each try and then it can load ALL the files in few minutes. Let me know if you want the fix.
I also created a script to run the commands for creating py38, instead of allowing people to run it manually, that helps to reduce the risk of issues when the setup is happening

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants