-
Notifications
You must be signed in to change notification settings - Fork 684
Description
Is your feature request related to a problem?
This is a bit broad, IDK if this should be a discussion.
I want to be able to profile queries. Specifically, which parts of the query are slow. This is tricky, because in ibis computation happens in two stages:
- expression building: This is always fast, since we aren't actually processing any data. But this is where I want to be able to assign blame, ie the
table.distinct(on="my_id")call is responsible for 90% of the query runtime. Using timeit or other python profiling is useless here. - runtime: This is where the big data is actually processed, and where the time actually happens.
I can currently do A/B testing, by just timing the whole run, but this is annoying to have to think about what to change, and then writing the change.
What is the motivation behind your request?
I am motived both as an end user, but also as a library author.
Describe the solution you'd like
Ideally every Operation could get associated with a run time, so for instance if you plot an expression graph you could see the runtime next to each node.
That seems very difficult though. Still very usable would be if we just got back the raw, non-machine readable output from eg EXPLAIN ANALYZE <QUERY>. This solution I don't think would even need to be implemented in ibis, but it would be ergonomic if it was a simple expr.profile() or ibis.profile(expr).
I'm not sure about backend support. I don't know, but I imagine most of the SQL backends support some sort of profiling. SQlite profiling looks very different from duckdb profiling.
What version of ibis are you running?
main
What backend(s) are you using, if any?
No response
Code of Conduct
- I agree to follow this project's Code of Conduct
Metadata
Metadata
Assignees
Labels
Type
Projects
Status