-
Notifications
You must be signed in to change notification settings - Fork 2.3k
chore: bump indexing usage #7307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No issues found across 2 files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR increases chunk indexing usage limits and simplifies trial detection logic:
Changes:
- Indexing Limits Increased: Trial limit raised from 10,000 → 100,000 chunks/week (10x), paid limit from 50,000 → 1,000,000 chunks/week (20x)
- Trial Detection Simplified: Removed
trial_enddate validation, now only checks Stripe'sstatus == "trialing"field - Unused Imports Removed: Cleaned up
datetimeandtimezoneimports no longer needed after simplification
Key Concerns:
- The trial detection change removes defensive date checking that protected against edge cases (e.g., Stripe status update delays, clock skew). The previous logic verified both the status field AND that the trial period was still valid by checking
trial_end > now - Default values use integer literals instead of strings, inconsistent with the codebase pattern used by all other config variables
- The PR title/description only mentions "bump indexing usage" but doesn't document the significant behavioral change to trial detection logic
Confidence Score: 3/5
- This PR has a logic change that removes defensive checks and could cause edge case issues in production
- Score reflects concern about the undocumented trial detection logic change. While the config value increases are straightforward, the removal of trial_end date validation could lead to tenants being incorrectly classified as on trial if Stripe's status field isn't updated immediately. The previous implementation was more defensive and safer.
- Pay close attention to backend/ee/onyx/server/usage_limits.py - the trial detection logic change needs verification that Stripe status updates are always immediate and reliable
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| backend/shared_configs/configs.py | 4/5 | Increased chunk indexing limits from 10k→100k (trial) and 50k→1M (paid), but uses integer defaults instead of string defaults (inconsistent with codebase pattern) |
| backend/ee/onyx/server/usage_limits.py | 3/5 | Simplified trial detection to only check status field, removing trial_end date validation - potential edge case issues if Stripe status updates are delayed |
Sequence Diagram
sequenceDiagram
participant Client
participant UsageLimits as usage_limits.py
participant TrialFn as is_tenant_on_trial_fn()
participant EEImpl as ee/.../usage_limits.py
participant Billing as billing.py
participant ControlPlane as Control Plane API
Client->>UsageLimits: check_usage_and_raise(tenant_id)
UsageLimits->>TrialFn: is_tenant_on_trial_fn(tenant_id)
TrialFn->>EEImpl: is_tenant_on_trial(tenant_id)
alt MULTI_TENANT enabled
EEImpl->>Billing: fetch_billing_information(tenant_id)
Billing->>ControlPlane: GET /billing-information
ControlPlane-->>Billing: BillingInformation or SubscriptionStatusResponse
alt SubscriptionStatusResponse (no subscription)
Billing-->>EEImpl: SubscriptionStatusResponse
EEImpl-->>TrialFn: return True (new tenant on trial)
else BillingInformation exists
Billing-->>EEImpl: BillingInformation
Note over EEImpl: OLD: Check trial_end > now AND status=="trialing"<br/>NEW: Only check status=="trialing"
EEImpl-->>TrialFn: return status == "trialing"
end
else MULTI_TENANT disabled
EEImpl-->>TrialFn: return False
end
TrialFn-->>UsageLimits: is_trial (bool)
alt is_trial == True
UsageLimits->>UsageLimits: Get trial limits (100k chunks)
else is_trial == False
UsageLimits->>UsageLimits: Get paid limits (1M chunks)
end
UsageLimits->>UsageLimits: check_usage_limit(limit)
alt Limit exceeded
UsageLimits-->>Client: HTTPException 429
else Within limit
UsageLimits-->>Client: Allow request
end
| # Per-week chunks indexed limits | ||
| USAGE_LIMIT_CHUNKS_INDEXED_TRIAL = int( | ||
| os.environ.get("USAGE_LIMIT_CHUNKS_INDEXED_TRIAL", "10000") | ||
| os.environ.get("USAGE_LIMIT_CHUNKS_INDEXED_TRIAL", 100_000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value should be a string to maintain consistency with other config variables in this file. All other int(os.environ.get(...)) patterns use string defaults (e.g., lines 226, 231, 234, 246, 247, 251, 254).
| os.environ.get("USAGE_LIMIT_CHUNKS_INDEXED_TRIAL", 100_000) | |
| os.environ.get("USAGE_LIMIT_CHUNKS_INDEXED_TRIAL", "100000") |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: backend/shared_configs/configs.py
Line: 239:239
Comment:
The default value should be a string to maintain consistency with other config variables in this file. All other `int(os.environ.get(...))` patterns use string defaults (e.g., lines 226, 231, 234, 246, 247, 251, 254).
```suggestion
os.environ.get("USAGE_LIMIT_CHUNKS_INDEXED_TRIAL", "100000")
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| ) | ||
| USAGE_LIMIT_CHUNKS_INDEXED_PAID = int( | ||
| os.environ.get("USAGE_LIMIT_CHUNKS_INDEXED_PAID", "50000") | ||
| os.environ.get("USAGE_LIMIT_CHUNKS_INDEXED_PAID", 1_000_000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value should be a string to maintain consistency with other config variables in this file. All other int(os.environ.get(...)) patterns use string defaults.
| os.environ.get("USAGE_LIMIT_CHUNKS_INDEXED_PAID", 1_000_000) | |
| os.environ.get("USAGE_LIMIT_CHUNKS_INDEXED_PAID", "1000000") |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: backend/shared_configs/configs.py
Line: 242:242
Comment:
The default value should be a string to maintain consistency with other config variables in this file. All other `int(os.environ.get(...))` patterns use string defaults.
```suggestion
os.environ.get("USAGE_LIMIT_CHUNKS_INDEXED_PAID", "1000000")
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| if isinstance(billing_info, BillingInformation): | ||
| # Check if trial is active | ||
| if billing_info.trial_end is not None: | ||
| now = datetime.now(timezone.utc) | ||
| # Trial active if trial_end is in the future | ||
| # and subscription status indicates trialing | ||
| if billing_info.trial_end > now and billing_info.status == "trialing": | ||
| return True | ||
| return billing_info.status == "trialing" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the trial_end date check could lead to edge cases where a tenant is incorrectly classified as on trial. The previous implementation checked both trial_end > now AND status == "trialing" to ensure the trial period was actually still active.
Consider these scenarios:
- Stripe status update delays: If there's any delay in Stripe updating the subscription status after the trial ends, tenants could continue receiving trial limits
- Clock skew: The control plane and data plane might have slight time differences
- Grace periods: Some systems may want to continue trial status briefly after trial_end for grace periods
The previous implementation was more defensive and explicit. If this change is intentional (e.g., you've verified Stripe always updates status immediately), it should be documented in the PR description or as a code comment explaining why the date check is no longer needed.
Prompt To Fix With AI
This is a comment left during a code review.
Path: backend/ee/onyx/server/usage_limits.py
Line: 30:31
Comment:
Removing the `trial_end` date check could lead to edge cases where a tenant is incorrectly classified as on trial. The previous implementation checked both `trial_end > now` AND `status == "trialing"` to ensure the trial period was actually still active.
Consider these scenarios:
1. **Stripe status update delays**: If there's any delay in Stripe updating the subscription status after the trial ends, tenants could continue receiving trial limits
2. **Clock skew**: The control plane and data plane might have slight time differences
3. **Grace periods**: Some systems may want to continue trial status briefly after trial_end for grace periods
The previous implementation was more defensive and explicit. If this change is intentional (e.g., you've verified Stripe always updates status immediately), it should be documented in the PR description or as a code comment explaining why the date check is no longer needed.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this valid?
Description
see title
How Has This Been Tested?
n/a
Additional Options
Summary by cubic
Raised weekly indexing limits to 100k for trial tenants and 1M for paid tenants to support higher usage. Simplified trial detection to rely on subscription status only, removing the trial_end date check.
Written for commit 092462d. Summary will update on new commits.