New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add Repetition Range ('rep_range') #888

Open

discordianbelle wants to merge 6 commits into PygmalionAI:main from everai-limited:rep_range

Contributor

discordianbelle commented Dec 14, 2024

Adds a range to the repetition penalties (all samplers under do_penalties)

Counts back from the current token, applying to all output tokens within range, then prompt tokens if the range extends that far

The most expensive operations are just simple slicing operations which are relatively fast in PyTorch.

discordianbelle added 6 commits

December 14, 2024 00:03


          Update sampler.py for repetition range


          Update sampling_params.py for rep_range

2704c89


          Update protocol.py for rep_range

eb05f16


          Avoid circular import for simple warning

83bdc2d


          fix typo

ffc5b0b


          Update sampler.py to avoid circular imports

067d748

AlpinDale requested changes

View reviewed changes

Member

AlpinDale left a comment •

edited

Loading

Thanks for the PR. There's a few issues in its current state, as outlined in my comments. Mainly, this seems designed for sequential inference, and will fail for batched inference, due to no handling for per-sequence differences within a batch.

Also please run ./formatting.sh to fix linting issues.

aphrodite/common/sampling_params.py

    
              _SAMPLING_EPS = 1e-5

              _MAX_TEMP = 1e-2

              APHRODITE_NO_DEPRECATION_WARNING = envs.APHRODITE_NO_DEPRECATION_WARNING

              APHRODITE_NO_DEPRECATION_WARNING = bool(int(os.environ.get("APHRODITE_NO_DEPRECATION_WARNING", "0")))

Member

AlpinDale Dec 14, 2024

Why this change?

Contributor Author

discordianbelle Dec 14, 2024

After installing as editible and then modifying the files, I was getting circular import errors. As far as I could tell it was because envs.py is in ./aphrodite/ and not ./aphrodite/common/

I was trying to make the least impactful change that still let it run, so I didn't want to move envs.py

aphrodite/common/sampling_params.py

@@ @@ -400,6 +402,9 @@ def _verify_args(self) -> None: @@
                       if self.repetition_penalty < 1.0:
                           raise ValueError("repetition_penalty must be in [1, inf), got "
                                            f"{self.repetition_penalty}.")
+                      if self.rep_range is not None and self.rep_range < 1:

Member

AlpinDale Dec 14, 2024

We should probably allow 0 for infinite (or all tokens) range. Unless other inference software do it this way.

Contributor Author

discordianbelle Dec 14, 2024

You're absolutely right, I'll implement on Monday

aphrodite/modeling/layers/sampler.py

    
            @@ -34,7 +34,7 @@
          
              # If enabled, we switch to a more performant implementation

              # of top-k and top-p

              APHRODITE_USE_SAMPLING_KERNELS = envs.APHRODITE_USE_SAMPLING_KERNELS

              APHRODITE_USE_SAMPLING_KERNELS = bool(int(os.environ.get("APHRODITE_USE_SAMPLING_KERNELS", "0")))

Member

AlpinDale Dec 14, 2024

Same as before.

Contributor Author

discordianbelle Dec 14, 2024

See other instance

aphrodite/modeling/layers/sampler.py

                   repetition_penalties = repetition_penalties[:, None].repeat(1, vocab_size)
                   repetition_penalties[~(prompt_mask | output_mask)] = 1.0
                   logits = torch.where(logits > 0, logits / repetition_penalties,
                                        logits * repetition_penalties)
-                  # We follow the definition in OpenAI API.

Member

AlpinDale Dec 14, 2024

Don't remove comment.

Contributor Author

discordianbelle Dec 14, 2024

o7
Sorry, danger of AI-assisted programming

aphrodite/modeling/layers/sampler.py

@@ @@ -272,7 +272,8 @@ def forward( @@
                                   sampling_tensors.output_tokens,
                                   sampling_tensors.presence_penalties,
                                   sampling_tensors.frequency_penalties,
-                                  sampling_tensors.repetition_penalties)
+                                  sampling_tensors.repetition_penalties,
+                                  rep_range=rep_range)

Member

AlpinDale Dec 14, 2024

Suggested change

      
                                rep_range=rep_range)
          
                                sampling_tensors.rep_range)

This parameter needs to be added to the sampling_metadata module.

Contributor Author

discordianbelle Dec 14, 2024

I'll fix on Monday!

aphrodite/modeling/layers/sampler.py

                                    output_tokens_tensor: torch.Tensor,
                                    presence_penalties: torch.Tensor,
                                    frequency_penalties: torch.Tensor,
-                                   repetition_penalties: torch.Tensor) -> torch.Tensor:
+                                   repetition_penalties: torch.Tensor,
+                                   rep_range: Optional[int] = None) -> torch.Tensor:

Member

AlpinDale Dec 14, 2024 •

edited

Loading

Suggested change

      
                                 rep_range: Optional[int] = None) -> torch.Tensor:
          
                                 rep_range: torch.Tensor) -> torch.Tensor:

Must be tensorized after it's added to sampling_metadata. If we treat it as a single integer, it'll apply to all sequences within the batch; tensorizing this will allow us to match the batch dimension, in case other sequences may want different ranges.

Contributor Author

discordianbelle Dec 14, 2024

Great point, will handle

aphrodite/modeling/layers/sampler.py

+                  if rep_range is not None and rep_range > 0:
+                      # Just take the last rep_range tokens from output_tokens_tensor
+                      # This is much more efficient as we're only looking at recent history
+                      output_tokens_tensor = output_tokens_tensor[:, -rep_range:]

Member

AlpinDale Dec 14, 2024

This is applying the same range to all sequences in batch, no?

Also creating a new tensor here is probably less efficient. I think the slicing should be done in the bin counting function above this.

Contributor Author

discordianbelle Dec 14, 2024

Agreed, I'll have to go over it again with batching in mind, and also agreed about rolling in the slicing

aphrodite/modeling/layers/sampler.py

+                      if output_len < rep_range:
+                          # Calculate how many prompt tokens we should include
+                          prompt_tokens_to_include = min(rep_range - output_len, prompt_end_idx)
+                          prompt_tokens_tensor = prompt_tokens_tensor[:, -prompt_tokens_to_include:]

Member

AlpinDale Dec 14, 2024

Not all sequences in a batch have the same output len (and consequently may not include the same number of prompt tokens).

Contributor Author

discordianbelle Dec 14, 2024

Agreed, I'll have to go over it again with batching in mind

aphrodite/modeling/layers/sampler.py

+                          prompt_tokens_tensor = prompt_tokens_tensor[:, -prompt_tokens_to_include:]
+                      else:
+                          # If we have enough output tokens, ignore prompt completely
+                          prompt_tokens_tensor = torch.empty((num_seqs, 0), dtype=torch.long, device=logits.device)

Member

AlpinDale Dec 14, 2024

Do we need to do this here? I think most of the range ops can be done in bin counting.

Contributor Author

discordianbelle Dec 14, 2024

I'll refactor it, thanks again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet