-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] HybridCache not subscriptable #1047
base: main
Are you sure you want to change the base?
Conversation
# TODO: this seems to get set to the length of the first sequence we pass for models using | ||
# StaticCache or HybridCache. We need to initialize our own cache with a large enough size | ||
# if we want to continue generation with the same cache. | ||
self._past_key_values = None | ||
past_length = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is necessary for SlidingWindowCache?
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #1047 +/- ##
==========================================
- Coverage 71.96% 63.53% -8.43%
==========================================
Files 63 63
Lines 4769 4797 +28
==========================================
- Hits 3432 3048 -384
- Misses 1337 1749 +412 ☔ View full report in Codecov by Sentry. |
Passing all the model specific tests in the CI Tests workflow. General tests failing due to azurecloud auth issues -- not sure if I am able to rerun with the right perms. But I believe all is fine and dandy with the PR. |
Changes since submitting PR:
|
@paulbkoch any feedback? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK with me if it makes the tests work better. But @paulbkoch likely understands the details better
Build currently fails due to gemma2's usage of a HybridCache which doesn't support tuple slicing like the friendlier DynamicCache.
"Fixing" this issue (just throwing the cache away...) immediately uncovered another one -- the HybridCache has a maximum size. If we don't set this manually, it is set to the sequence length of the first token sequence the model is called with. Trying to do another forward pass with more tokens leads to exceptions deep down inside of gemma's implementation. Current "fix" is to again... throw the cache away.
Hoping for something more elegant. But I don't think this is too insane for now.
Note: now taking advantage of
Cache.crop
for cache implementations that support it. This should prevent conversion back and forth from the "legacy" cache format that we previously assumed. (Should fix #986).