[Cache] Add cachedPrefixes for caching repeated system prompts #664

YiyanZhai · 2025-02-04T04:06:21Z

This PR adds the cachedPrefixes field in MLCEngineConfig, allowing users to cache system prompts when creating MLCEngine. It reduces redundant processing of repeated instructions.

Example usage in CreateMLCEngine:

await webllm.CreateMLCEngine(
  selectedModel,
  {
    initProgressCallback: initProgressCallback,
    logLevel: "INFO",
    cachedPrefixes: [
      [ { role: "system", content: "You are a helpful assistant running in the user's browser. You need to answer questions ... " }, ]
    ],
  },
  {
    context_window_size: 2048,
  }
);

CharlieFRuan

Thank you for the hard work! Added some comments. Please add an E2E example under examples/. I will take another pass afterwards. Thanks again!

CharlieFRuan · 2025-02-10T17:24:43Z

src/config.ts

  initProgressCallback?: InitProgressCallback;
  logitProcessorRegistry?: Map<string, LogitProcessor>;
  logLevel?: LogLevel;
+  cachedPrefixes?: ChatCompletionMessageParam[][];


Let's add docs to MLCEngineConfig, specifying the behavior of cachedPrefixes (e.g. will prefill when loading the engine to create the prefixes' KV, will only dispose these KV when reloading the engine). Perhaps we can also mark this as experimental to signify potential future API/behavior change

CharlieFRuan · 2025-02-10T17:39:50Z

src/config.ts

  initProgressCallback?: InitProgressCallback;
  logitProcessorRegistry?: Map<string, LogitProcessor>;
  logLevel?: LogLevel;
+  cachedPrefixes?: ChatCompletionMessageParam[][];


Could you also add an examples/cached_prefixes? Where we can demonstrate the prefill time difference between using cachedPrefixes and not using it. We should also test whether the behavior is expected in multi-turn conversation.

CharlieFRuan · 2025-02-10T17:46:02Z

src/llm_chat.ts

+    if (this.seqIdToPrefix.size === 0) {
+      this.fclearKVCaches(this.kvCache);
+    } else {
+      this.fKVCacheRemoveSequence!(this.kvCache, new tvmjs.Scalar(0, "int64"));


Now that we have multiple sequence IDs, let's make a constant, say CHAT_SEQUENCE_ID=0 (or maybe a better naming), instead of using a magic number 0 that may be hard to keep track of

CharlieFRuan · 2025-02-10T17:53:21Z

src/llm_chat.ts

+
+    // If a match is found, fork the sequence
+    if (matchedSeqId !== -1 && maxMatchedLen > 0) {
+      console.log(


Use log.info() instead of console.log()

CharlieFRuan · 2025-02-10T17:53:32Z

src/llm_chat.ts

+      this.tvm.endScope();
+    } else if (seqID !== 0) {
+      // If no match is found, add the new sequence to the KV cache
+      console.log("Adding prefix to KV cache: ", seqID);


Use log.info() instead of console.log()

YiyanZhai · 2025-02-17T20:56:00Z

Thanks for the feedback! I’ve addressed the comments. Let me know if any further changes are needed!

YiyanZhai added 2 commits February 3, 2025 21:39

Added cachedPrefixes to cache long system prompts when creating engine.

2af2b16

Cleaned console logs

92e00b9

YiyanZhai requested a review from CharlieFRuan February 4, 2025 04:06

CharlieFRuan reviewed Feb 10, 2025

View reviewed changes

YiyanZhai force-pushed the main branch from 88e9a41 to b7f94d4 Compare February 17, 2025 20:51

Addressed review comments

dfd7ba8

YiyanZhai force-pushed the main branch from b7f94d4 to dfd7ba8 Compare February 17, 2025 20:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Cache] Add cachedPrefixes for caching repeated system prompts #664

[Cache] Add cachedPrefixes for caching repeated system prompts #664

YiyanZhai commented Feb 4, 2025

Uh oh!

CharlieFRuan left a comment

Uh oh!

CharlieFRuan Feb 10, 2025

Uh oh!

CharlieFRuan Feb 10, 2025

Uh oh!

CharlieFRuan Feb 10, 2025

Uh oh!

CharlieFRuan Feb 10, 2025

Uh oh!

CharlieFRuan Feb 10, 2025

Uh oh!

YiyanZhai commented Feb 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Cache] Add cachedPrefixes for caching repeated system prompts #664

Are you sure you want to change the base?

[Cache] Add cachedPrefixes for caching repeated system prompts #664

Conversation

YiyanZhai commented Feb 4, 2025

Uh oh!

CharlieFRuan left a comment

Choose a reason for hiding this comment

Uh oh!

CharlieFRuan Feb 10, 2025

Choose a reason for hiding this comment

Uh oh!

CharlieFRuan Feb 10, 2025

Choose a reason for hiding this comment

Uh oh!

CharlieFRuan Feb 10, 2025

Choose a reason for hiding this comment

Uh oh!

CharlieFRuan Feb 10, 2025

Choose a reason for hiding this comment

Uh oh!

CharlieFRuan Feb 10, 2025

Choose a reason for hiding this comment

Uh oh!

YiyanZhai commented Feb 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants