Skip to content

Conversation

robbie-c
Copy link
Member

@robbie-c robbie-c commented Oct 3, 2025

Problem

I found that we were not making good use of indexing and partition pruning, this PR fixes that. See some discussion #38809 (comment)

Changes

  • Make a separate session_timestamp column which is materialized from the session ID
  • Update the where clause extractor to support this
  • Add extra logic to the where clause extractor to support point queries

How did you test this code?

  • Added some more tests to the HogQL sessions test files
  • Updated the where clause extractor v3 tests

👉 Stay up-to-date with PostHog coding conventions for a smoother review.

Changelog: (features only) Is this feature complete?

@robbie-c robbie-c requested a review from a team as a code owner October 3, 2025 11:57
@robbie-c robbie-c requested review from a team October 3, 2025 11:57
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

14 files reviewed, 6 comments

Edit Code Review Agent Settings | Greptile

"toSecond": HogQLFunctionMeta("toSecond", 1, 1),
"toUnixTimestamp": HogQLFunctionMeta("toUnixTimestamp", 1, 2),
"toUnixTimestamp64Milli": HogQLFunctionMeta("toUnixTimestamp64Milli", 1, 1),
"fromUnixTimestamp64Milli": HogQLFunctionMeta("fromUnixTimestamp64Milli", 1, 1),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Consider adding type signatures for this function like other similar functions (e.g., fromUnixTimestamp on line 193) to ensure proper type checking and return type specification

Suggested change
"fromUnixTimestamp64Milli": HogQLFunctionMeta("fromUnixTimestamp64Milli", 1, 1),
"fromUnixTimestamp64Milli": HogQLFunctionMeta(
"fromUnixTimestamp64Milli",
1,
1,
signatures=[
((IntegerType(),), DateTimeType()),
],
),
Prompt To Fix With AI
This is a comment left during a code review.
Path: posthog/hogql/functions/clickhouse/datetime.py
Line: 21:21

Comment:
**style:** Consider adding type signatures for this function like other similar functions (e.g., `fromUnixTimestamp` on line 193) to ensure proper type checking and return type specification

```suggestion
    "fromUnixTimestamp64Milli": HogQLFunctionMeta(
        "fromUnixTimestamp64Milli",
        1,
        1,
        signatures=[
            ((IntegerType(),), DateTimeType()),
        ],
    ),
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +140 to +144
def test_timestamp_unrelated_function_timestamp(self):
actual = f(
self.inliner.get_inner_where(parse("SELECT * FROM sessions WHERE like(toString(min_timestamp), 'b')"))
)
assert actual is None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Duplicate test method with identical logic to test_timestamp_unrelated_function above - consider removing or differentiating

Prompt To Fix With AI
This is a comment left during a code review.
Path: posthog/hogql/database/schema/util/test/test_session_v3_where_clause_extractor.py
Line: 140:144

Comment:
**style:** Duplicate test method with identical logic to test_timestamp_unrelated_function above - consider removing or differentiating

How can I resolve this? If you propose a fix, please make it concise.

"""
select
session_id,
from sessions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax: Extra comma creates syntax error in SQL

Suggested change
from sessions
session_id
from sessions
Prompt To Fix With AI
This is a comment left during a code review.
Path: posthog/hogql/database/schema/util/test/test_session_v3_where_clause_extractor.py
Line: 612:612

Comment:
**syntax:** Extra comma creates syntax error in SQL

```suggestion
    session_id
    from sessions
```

How can I resolve this? If you propose a fix, please make it concise.

select
session.id as session_id,
from events
where session_id = {session_id} AND timestamp >= '1970-01-01'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax: Missing comma after session.id as session_id, - this will cause a SQL syntax error

Suggested change
where session_id = {session_id} AND timestamp >= '1970-01-01'
select
session.id as session_id
from events
where session_id = {session_id} AND timestamp >= '1970-01-01'
Prompt To Fix With AI
This is a comment left during a code review.
Path: posthog/hogql/database/schema/test/test_sessions_v3.py
Line: 641:641

Comment:
**syntax:** Missing comma after `session.id as session_id,` - this will cause a SQL syntax error

```suggestion
                select
                    session.id as session_id
                from events
                where session_id = {session_id} AND timestamp >= '1970-01-01'
```

How can I resolve this? If you propose a fix, please make it concise.

@posthog-bot posthog-bot requested a review from a team October 3, 2025 11:58
@robbie-c robbie-c force-pushed the feat/sessions-v3-optimizations branch from e7216cb to 2f78f49 Compare October 3, 2025 12:50

if node.op == CompareOperationOp.Eq:
if is_left_constant and is_session_id_string_expr(node.right, self.context):
left_timestamp_expr = self.session_id_str_to_timestamp_expr(node.left)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible for session_id_str_to_timestamp_expr to return None? If so, how will be known about it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

session_id_str_to_timestamp_expr can be null (it uses accurateCastOrNull(x, 'UUID') behind the scenes)

I don't believe that this being nullable has performance implications, and in testing, if I use a garbage constant string for the UUID (e.g. select * from sessions where session_id = 'garbage'), clickhouse was smart enough to know it doesn't need to load any parts

Copy link
Member

@lricoy lricoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took my time, but really didn't spot anything

@robbie-c robbie-c force-pushed the feat/sessions-v3-optimizations branch from 6b51daa to 0dd5e6d Compare October 6, 2025 16:12
@robbie-c robbie-c force-pushed the feat/sessions-v3-optimizations branch from 0dd5e6d to 918a13a Compare October 6, 2025 16:13
Copy link
Member

@lricoy lricoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

@robbie-c robbie-c merged commit 6be9e4d into master Oct 7, 2025
192 of 196 checks passed
@robbie-c robbie-c deleted the feat/sessions-v3-optimizations branch October 7, 2025 09:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants