Android example long prompt cache? #1077

scsonic · 2024-11-19T18:34:22Z

scsonic
Nov 19, 2024

I am currently using the Android Phi3 example.
If my system prompt is very long, it takes more than 90 seconds to process.
I’m wondering if there’s a way to cache the result of this 90-second processing.

When using llama.cpp for mobile,
it remembers the result, so next time, there’s no need to wait another 90 seconds.

or i can do something with classes in java api? import ai.onnxruntime.genai.[GeneratorParams, tokenizer, Model, Sequences]
is the KV cache in android working?

RyanUnderhill · 2024-11-22T23:39:19Z

RyanUnderhill
Nov 22, 2024
Maintainer

@aciddelgado 's latest PR adds a 'rewind' functionality so you can effectively cache the prompt by rewinding the generator back to the prompt position every iteration. It was just checked in, so it's not in a release yet.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Android example long prompt cache? #1077

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Android example long prompt cache? #1077

scsonic Nov 19, 2024

Replies: 1 comment

RyanUnderhill Nov 22, 2024 Maintainer

scsonic
Nov 19, 2024

RyanUnderhill
Nov 22, 2024
Maintainer