Skip to content

Conversation

pulpdrew
Copy link
Contributor

@pulpdrew pulpdrew commented Oct 14, 2025

Closes HDX-2576

It is a common optimization to have a primary key like toStartOfDay(Timestamp), ..., Timestamp. This PR improves the experience when using such a primary key in the following ways:

  1. HyperDX will now automatically filter on both toStartOfDay(Timestamp) and Timestamp in this case, instead of just Timestamp. This improves performance by better utilizing the primary index. Previously, this required a manual change to the source's Timestamp Column setting.
  2. HyperDX now applies the same toStartOfX function to the right-hand-side of timestamp comparisons. So when filtering using an expression like toStartOfDay(Timestamp), the generated SQL will have the condition toStartOfDay(Timestamp) >= toStartOfDay(<selected start time>) AND toStartOfDay(Timestamp) <= toStartOfDay(<selected end time>). This resolves an issue where some data would be incorrectly filtered out when filtering on such timestamp expressions (such as time ranges less than 1 minute).

With this change, teams should no longer need to have multiple columns in their source timestamp column configuration. However, if they do, they will now have correct filtering.

Copy link

changeset-bot bot commented Oct 14, 2025

🦋 Changeset detected

Latest commit: 0646139

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
@hyperdx/common-utils Patch
@hyperdx/app Patch
@hyperdx/api Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link

vercel bot commented Oct 14, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
hyperdx-v2-oss-app Ready Ready Preview Comment Oct 17, 2025 4:22pm

💡 Enable Vercel Agent with $100 free credit for automated AI reviews

Copy link

claude bot commented Oct 14, 2025

Pull Request Review

This PR introduces an optimization for filtering on toStartOfX primary key expressions. The implementation is solid with comprehensive test coverage. Here's my detailed feedback:

✅ Strengths

  1. Excellent test coverage: Both frontend and backend have thorough unit tests covering edge cases
  2. Well-documented: The code includes clear comments explaining the optimization strategy
  3. Backward compatible: Falls back gracefully to original behavior when optimization doesn't apply
  4. Performance improvement: Properly leverages ClickHouse primary key structure for better query performance

🔍 Code Quality Observations

Strong Points:

  • Clean separation of concerns with optimizeTimestampValueExpression as a pure function
  • Proper use of React hooks with correct dependency arrays
  • Good use of memoization to avoid unnecessary recalculations
  • Comprehensive test cases covering various timestamp expression formats

Minor Suggestions:

1. Error handling in parseToStartOfFunction (packages/common-utils/src/renderChartConfig.ts:489-514)

The function logs an error but returns undefined. Consider being more explicit:

const columnArgument = args[0];
if (columnArgument == null) {
  console.error(`Failed to parse column argument from ${expr}`);
  return undefined; // Explicit return
}

2. Documentation in hook (packages/app/src/hooks/useOptimizedTimestampValueExpression.ts:63)

The condition check on line 63 seems to exclude Metric sources from optimization. Consider adding a comment explaining why:

// Metrics use a different schema structure that doesn't benefit from this optimization
if (source.kind === SourceKind.Metric)
  return source.timestampValueExpression;

3. Test naming (packages/app/src/hooks/tests/useOptimizedTimestampValueExpression.test.tsx:120)

The test description says "ungracefully" but it actually handles the case gracefully by returning undefined. Consider:

it('should return undefined when source is null', () => {

4. Regex pattern robustness (packages/common-utils/src/renderChartConfig.ts:487)

The regex /(toStartOf\w+)\s*\(/ could potentially match unexpected functions. Consider being more explicit about which functions are supported or add validation.

🎯 Performance Considerations

Positive:

  • The optimization correctly adds the toStartOfX expression to filters, which will help ClickHouse utilize the primary key more effectively
  • Memoization in the hook prevents unnecessary recalculations

Consider:

  • The splitAndTrimWithBracket utility is called multiple times. In performance-critical paths, consider caching results if the same expression is parsed repeatedly

🔒 Security Concerns

No major security issues identified. The code:

  • Uses parameterized queries appropriately
  • Doesn't introduce SQL injection vectors
  • Properly validates and sanitizes inputs through existing patterns

📊 Test Coverage

Excellent coverage:

  • 224 lines of frontend tests covering 14 distinct scenarios
  • 159 lines of backend tests covering 15 distinct scenarios
  • Edge cases well covered (null handling, complex expressions, multiple timestamp parts)

One suggestion:
Consider adding an integration test that validates the actual SQL generated for a query with this optimization, to ensure end-to-end correctness.

📝 Additional Observations

  1. Consistency: The pattern of applying this optimization consistently across DBDashboardPage, DBSearchPage, and DBEditTimeChartForm is good
  2. Dependency management: All useEffect and useMemo dependencies are correctly specified
  3. Changeset: Properly includes changeset file for version tracking

🎓 Code Style Alignment with CLAUDE.md

✅ Follows TypeScript strict typing guidelines
✅ Proper functional component patterns with hooks
✅ Clear interfaces and type definitions
✅ Appropriate test organization in __tests__ directories
✅ Maintains existing code patterns

💡 Recommendation

LGTM with minor suggestions. This is a well-implemented optimization with excellent test coverage. The minor suggestions above are optional improvements that don't block merging. The PR successfully addresses the performance issue while maintaining code quality and backward compatibility.

Great work! 🚀

Copy link
Contributor

github-actions bot commented Oct 14, 2025

E2E Test Results

All tests passed • 25 passed • 3 skipped • 224s

Status Count
✅ Passed 25
❌ Failed 0
⚠️ Flaky 0
⏭️ Skipped 3

View full report →

Copy link

claude bot commented Oct 15, 2025

Pull Request Review - HyperDX #1265

Summary

This PR implements an optimization for ClickHouse queries with primary keys that use toStartOfX functions. The changes automatically detect and utilize these primary key expressions to improve query performance, and fix filtering issues with time ranges.

Strengths

1. Excellent Test Coverage

  • 207 new test cases in renderChartConfig.test.ts
  • 200 new test cases in utils.test.ts
  • Comprehensive edge cases including various toStartOf functions, spacing variations, timezones, and primary key optimization scenarios

2. Well-Designed API

  • parseToStartOfFunction: Clean parsing logic with clear return type
  • optimizeTimestampValueExpression: Focused, single-purpose function
  • Both functions are pure and easily testable

3. Performance Optimization

  • Automatically detects and utilizes primary key expressions without manual configuration
  • Properly applies toStartOfX functions to both sides of timestamp comparisons for efficient index usage

4. Backward Compatibility

  • Graceful fallback when optimization fails
  • Works with existing manually configured timestamp expressions
  • Does not break CTEs or synthetic columns

Issues and Concerns

1. Potential SQL Injection Risk (HIGH PRIORITY)

Location: packages/common-utils/src/renderChartConfig.ts:560, 566

The code directly interpolates the function name and arguments into SQL. While toStartOf.formattedRemainingArgs comes from parsing timestamp expressions, it is directly injected into SQL without sanitization.

Recommendation:

  • Validate that toStartOf.function matches an allowlist of known ClickHouse functions
  • Parse and validate formattedRemainingArgs instead of directly injecting it
  • Consider using parameterized queries for all parts of the expression

2. Error Handling Could Be More Specific

Location: packages/common-utils/src/renderChartConfig.ts:521-523

The catch block uses console.log instead of a proper logger and silently swallows all errors. This makes debugging difficult.

Recommendation: Use a proper logger with appropriate level and include context about the failure.

3. Regex Pattern Could Be More Robust

Location: packages/common-utils/src/utils.ts:598

The pattern (toStartOf\w+) matches any word characters, which could match invalid function names like toStartOfXYZ123.

Recommendation: Use an explicit allowlist of valid ClickHouse toStartOf functions.

4. Type Safety Issue

Location: packages/common-utils/src/renderChartConfig.ts:511-515

No type guard or validation that primary_key exists and is a string before using it.

Recommendation: Add proper type checking before destructuring.

5. Magic Strings in Logic

Location: packages/common-utils/src/utils.ts:645-649

Hard-coded function names like toUnixTimestamp and toDateTime make the code fragile and difficult to maintain.

Recommendation: Extract to constants or a configuration object.

Testing Recommendations

  1. Add integration tests that verify the actual SQL generated runs successfully against ClickHouse
  2. Add performance tests to verify the optimization improves query performance
  3. Add malicious input tests to ensure no SQL injection vulnerabilities

Security Review

Overall Security Posture: MEDIUM RISK

The main security concern is the direct string interpolation of parsed timestamp expressions into SQL queries. While the code uses the chSql template tag for parameterization in most places, the formattedRemainingArgs is directly inserted.

Action Items:

  1. Add input validation/sanitization for timestamp expressions
  2. Implement an allowlist for valid function names
  3. Consider security implications of where timestampValueExpression originates

Conclusion

This is a well-implemented feature with excellent test coverage that solves a real performance problem. The core logic is sound and the approach is pragmatic.

However, there are security concerns around SQL injection that should be addressed before merging. The error handling and input validation could also be improved.

Recommendation: Approve with required changes

  • Address the SQL injection concerns
  • Improve error handling
  • Add input validation

The test coverage is exemplary and gives high confidence in the correctness of the implementation.

@pulpdrew pulpdrew force-pushed the drew/optimize-timestamp-filters branch from 36275d6 to e983faa Compare October 15, 2025 14:35
@pulpdrew pulpdrew force-pushed the drew/optimize-timestamp-filters branch from e983faa to 711c169 Compare October 15, 2025 14:49
@pulpdrew pulpdrew marked this pull request as ready for review October 15, 2025 14:49
@pulpdrew pulpdrew force-pushed the drew/optimize-timestamp-filters branch from 711c169 to 1e7f9bb Compare October 15, 2025 14:50
@pulpdrew pulpdrew force-pushed the drew/optimize-timestamp-filters branch from 1e7f9bb to 7fa8ec0 Compare October 15, 2025 14:56
@pulpdrew pulpdrew force-pushed the drew/optimize-timestamp-filters branch from 7fa8ec0 to 779f63c Compare October 15, 2025 15:17
@pulpdrew pulpdrew requested review from a team, brandon-pereira and wrn14897 and removed request for a team October 15, 2025 15:28
Comment on lines +521 to +523
} catch (e) {
console.log('Failed to optimize timestampValueExpression', e);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clicking through the app, I don't run into this error, and don't expect it to happen, but am adding this try catch here as an extra layer of protection against an error either in getTableMetadata or optimizeTimestampValueExpression, in which case we can always fall back to the raw timestampValueExpression used before

@pulpdrew pulpdrew force-pushed the drew/optimize-timestamp-filters branch from 779f63c to 0646139 Compare October 17, 2025 16:18
Copy link

claude bot commented Oct 17, 2025

Code Review

Overall: Well-structured PR with comprehensive test coverage. The implementation correctly optimizes timestamp filtering for ClickHouse primary keys.

Critical Issues: None found

Important Notes:

  • Security: UNSAFE_RAW_SQL usage is appropriate (expressions from trusted sources)
  • Type Safety: Proper TypeScript with clear return types
  • Testing: Excellent coverage (219 + 200 new test lines)
  • Error Handling: Proper try-catch with graceful fallback
  • Performance: Optimization correctly targets primary key ordering

Optional Suggestions (non-blocking):

  • Consider adding console.warn in optimizeTimestampValueExpression if optimization fails
  • Add comment to regex in parseToStartOfFunction explaining the pattern

Recommendation: Approved - no blocking issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant