Add search strategy #164

nfergu · 2025-05-09T08:19:49Z

This is a proposal for adding a "search strategy" to token generation. This enables different strategies for selecting generated tokens. For example, see this (not tested) prototype of beam search that is based on this.

I have left the interface of generate_step unchanged by defaulting to a "linear search" strategy, which does the same thing as before. In addition, there is a new function called generate_with_search which accepts a SearchStrategy implementation, which allows other strategies to be used. I haven't plumbed generate_with_search into the main stream_generate but perhaps it should be?

There's perhaps some tidy up to be done, but I thought I'd get this out for feedback before I did too much more on it.

Let me know what you think. Happy to make changes. One alternative to this PR would be that search strategies would duplicate the code that is currently in generate_with_search within the SearchStrategy implementation itself. Or we refactor the code that is currently in generate_with_search into a helper function that search strategies would call. If you think either of those are nicer approaches I'll close this PR.

nfergu · 2025-05-09T08:21:07Z

mlx_lm/generate.py


    Yields:
-        Tuple[mx.array, mx.array]: One token and a vector of log probabilities.
+        Tuple[int, mx.array]: One token and a vector of log probabilities.


Not related to the main change, but I think the typing was wrong here. I'm pretty sure this method generates int, as it calls .item() on the token array, but I might be missing something.

nfergu · 2025-05-09T08:21:45Z

mlx_lm/generate.py

    *,
    max_tokens: int = 256,
-    sampler: Optional[Callable[mx.array, mx.array]] = None,
+    sampler: Optional[Callable[[mx.array], mx.array]] = None,


Not related to the main change, but I think the typing was wrong here. The sampler has a single mx.array argument and returns an mx.array AFAICT.

nfergu · 2025-05-09T08:22:20Z

mlx_lm/generate.py

    quantized_kv_start: int = 0,
-    prompt_progress_callback: Optional[Callable[int, int]] = None,
-) -> Generator[Tuple[mx.array, mx.array], None, None]:
+    prompt_progress_callback: Optional[Callable[[int, int], None]] = None,


Not related to the main change, but I think the typing was wrong here. The callback has two int arguments, and doesn't return anything AFAICT.

nfergu · 2025-05-11T17:01:59Z

mlx_lm/generate.py

+        prompt: mx.array,
+        prompt_cache: List[Any],
+        quantize_cache_fn: Callable[[Any], None],
+        total_prompt_tokens: int,


I think it's a bit awkward that generate needs to take total_prompt_tokens and prompt_progress_callback, but I could immediately see a nice way around this.

This reverts commit e676659.

This reverts commit 8fa36e0.

nfergu added 2 commits May 9, 2025 09:10

Add search strategy

2b185ae

Fix typing

2ddc39d

nfergu commented May 9, 2025

View reviewed changes

Probably better design

e676659

nfergu commented May 11, 2025

View reviewed changes

nfergu mentioned this pull request May 11, 2025

Beam search (prototype) nfergu/mlx-lm#1

Closed

nfergu added 2 commits May 11, 2025 18:12

Revert "Probably better design"

8fa36e0

This reverts commit e676659.

Revert "Revert "Probably better design""

8d328d7

This reverts commit 8fa36e0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add search strategy #164

Add search strategy #164

Uh oh!

nfergu commented May 9, 2025 •

edited

Loading

Uh oh!

nfergu May 9, 2025

Uh oh!

nfergu May 9, 2025 •

edited

Loading

Uh oh!

nfergu May 9, 2025 •

edited

Loading

Uh oh!

nfergu May 11, 2025

Uh oh!

Uh oh!

Add search strategy #164

Are you sure you want to change the base?

Add search strategy #164

Uh oh!

Conversation

nfergu commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nfergu May 9, 2025

Choose a reason for hiding this comment

Uh oh!

nfergu May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nfergu May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nfergu May 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nfergu commented May 9, 2025 •

edited

Loading

nfergu May 9, 2025 •

edited

Loading

nfergu May 9, 2025 •

edited

Loading