Skip to content

feat: explain or similar API for inspecting plans #9276

@NickCrews

Description

@NickCrews

Is your feature request related to a problem?

This is a bit broad, IDK if this should be a discussion.

I want to be able to profile queries. Specifically, which parts of the query are slow. This is tricky, because in ibis computation happens in two stages:

  1. expression building: This is always fast, since we aren't actually processing any data. But this is where I want to be able to assign blame, ie the table.distinct(on="my_id") call is responsible for 90% of the query runtime. Using timeit or other python profiling is useless here.
  2. runtime: This is where the big data is actually processed, and where the time actually happens.

I can currently do A/B testing, by just timing the whole run, but this is annoying to have to think about what to change, and then writing the change.

What is the motivation behind your request?

I am motived both as an end user, but also as a library author.

Describe the solution you'd like

Ideally every Operation could get associated with a run time, so for instance if you plot an expression graph you could see the runtime next to each node.

That seems very difficult though. Still very usable would be if we just got back the raw, non-machine readable output from eg EXPLAIN ANALYZE <QUERY>. This solution I don't think would even need to be implemented in ibis, but it would be ergonomic if it was a simple expr.profile() or ibis.profile(expr).

I'm not sure about backend support. I don't know, but I imagine most of the SQL backends support some sort of profiling. SQlite profiling looks very different from duckdb profiling.

What version of ibis are you running?

main

What backend(s) are you using, if any?

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureFeatures or general enhancements

    Type

    No type

    Projects

    Status

    backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions