Android example long prompt cache? #1077
scsonic
started this conversation in
New features / APIs
Replies: 1 comment
-
@aciddelgado 's latest PR adds a 'rewind' functionality so you can effectively cache the prompt by rewinding the generator back to the prompt position every iteration. It was just checked in, so it's not in a release yet. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am currently using the Android Phi3 example.
If my system prompt is very long, it takes more than 90 seconds to process.
I’m wondering if there’s a way to cache the result of this 90-second processing.
When using llama.cpp for mobile,
it remembers the result, so next time, there’s no need to wait another 90 seconds.
or i can do something with classes in java api? import ai.onnxruntime.genai.[GeneratorParams, tokenizer, Model, Sequences]
is the KV cache in android working?
Beta Was this translation helpful? Give feedback.
All reactions