Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(repr): add show_count option to interactive repr #10518

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions ibis/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,9 @@ class Interactive(Config):
Maximum depth for nested data types.
show_types : bool
Show the inferred type of value expressions in the interactive repr.
show_count: bool
For Columns and Tables, show the row count. This can be computationally
expensive and slow.

"""

Expand All @@ -76,6 +79,7 @@ class Interactive(Config):
max_string: int = 80
max_depth: int = 1
show_types: bool = True
show_count: bool = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a nice feature to have. I do not think it should default to true.

Interactive mode is supposed to be interactive, and something that can be computationally expensive isn't something we want to turn on by default.

The main failure case I can think of here is reading from CSVs in a blob store. With show_count=False, this can be done with a range request and only download the first N rows (ish), except if there's an order-by. show_count=True would mean any operation would trigger a download of the entire file.

And yes, that is one more reason not to use CSVs, but 🤷

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I am OK with defaulting to False. Let's just wait for Philip to chime in to verify, but I assume he is going to have the same opinion as you.

I really don't think it is worth the complexity/cleverness, but throwing it out there that we could make each relation have its own config, and that could be auto-determined based on if it's a CSV/parquet/physical table, local/remote, etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume he is going to have the same opinion as you.

We are slowly merging into the same person. The transformation is nearly complete.

I really don't think it is worth the complexity/cleverness, but throwing it out there that we could make each relation have its own config, and that could be auto-determined based on if it's a CSV/parquet/physical table, local/remote, etc.

I mean, it's definitely a reliably cheap(er) operation on anything that isn't a CSV, but I agree this seems like too much complexity. I think a better move is your suggestion of persistent default configuration

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's leave the default to False, otherwise it's a huge performance footgun/bug magnet.



class Repr(Config):
Expand Down
11 changes: 8 additions & 3 deletions ibis/expr/types/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -1366,6 +1366,7 @@ def preview(
max_length: int | None = None,
max_string: int | None = None,
max_depth: int | None = None,
show_count: bool | None = None,
console_width: int | float | None = None,
) -> rich.table.Table:
"""Print a subset as a single-column Rich Table.
Expand All @@ -1385,6 +1386,8 @@ def preview(
Maximum length for pretty-printed strings.
max_depth
Maximum depth for nested data types.
show_count
Show the row count. This can be computationally expensive and slow.
console_width
Width of the console in characters. If not specified, the width
will be inferred from the console.
Expand All @@ -1407,14 +1410,16 @@ def preview(
"""
from ibis.expr.types.pretty import to_rich

return to_rich(
self,
overrides = dict(
max_rows=max_rows,
max_length=max_length,
max_string=max_string,
max_depth=max_depth,
console_width=console_width,
show_count=show_count,
)
overrides = {k: v for k, v in overrides.items() if v is not None}
options = ibis.options.repr.interactive.copy(**overrides)
return to_rich(self, options=options, console_width=console_width)

def __pyarrow_result__(
self,
Expand Down
71 changes: 31 additions & 40 deletions ibis/expr/types/pretty.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
import ibis.expr.datatypes as dt

if TYPE_CHECKING:
from ibis.config import Interactive
from ibis.expr.types import Column, Expr, Scalar, Table


Expand Down Expand Up @@ -261,64 +262,43 @@ def format_dtype(dtype, max_string: int) -> Text:
def to_rich(
expr: Expr,
*,
max_rows: int | None = None,
max_columns: int | None = None,
max_length: int | None = None,
max_string: int | None = None,
max_depth: int | None = None,
console_width: int | float | None = None,
options: Interactive | None = None,
) -> Pretty:
"""Truncate, evaluate, and render an Ibis expression as a rich object."""
if options is None:
options = ibis.options.repr.interactive
from ibis.expr.types import Scalar

if isinstance(expr, Scalar):
return _to_rich_scalar(
expr, max_length=max_length, max_string=max_string, max_depth=max_depth
)
return _to_rich_scalar(expr, options)
else:
return _to_rich_table(
expr,
max_rows=max_rows,
max_columns=max_columns,
max_length=max_length,
max_string=max_string,
max_depth=max_depth,
console_width=console_width,
)
return _to_rich_table(expr, options, console_width=console_width)


def _to_rich_scalar(
expr: Scalar,
*,
max_length: int | None = None,
max_string: int | None = None,
max_depth: int | None = None,
) -> Pretty:
def _to_rich_scalar(expr: Scalar, options: Interactive) -> Pretty:
value = format_values(
expr.type(),
[expr.to_pyarrow().as_py()],
max_length=max_length or ibis.options.repr.interactive.max_length,
max_string=max_string or ibis.options.repr.interactive.max_string,
max_depth=max_depth or ibis.options.repr.interactive.max_depth,
max_length=options.max_length,
max_string=options.max_string,
max_depth=options.max_depth,
)[0]
return Panel(value, expand=False, box=box.SQUARE)


def _to_rich_table(
tablish: Table | Column,
*,
max_rows: int | None = None,
max_columns: int | None = None,
max_length: int | None = None,
max_string: int | None = None,
max_depth: int | None = None,
options: Interactive,
console_width: int | float | None = None,
) -> rich.table.Table:
max_rows = max_rows or ibis.options.repr.interactive.max_rows
max_columns = max_columns or ibis.options.repr.interactive.max_columns
from ibis.expr import types as ir

console_width = console_width or float("inf")
max_string = max_string or ibis.options.repr.interactive.max_string
show_types = ibis.options.repr.interactive.show_types
max_rows = options.max_rows
max_columns = options.max_columns
max_string = options.max_string
show_types = options.show_types

table = tablish.as_table()
orig_ncols = len(table.columns)
Expand Down Expand Up @@ -360,9 +340,9 @@ def _to_rich_table(
formatted, min_width, max_width = format_column(
dtype,
result[name].to_pylist()[:max_rows],
max_length=max_length,
max_length=options.max_length,
max_string=max_string,
max_depth=max_depth,
max_depth=options.max_depth,
)
dtype_str = format_dtype(dtype, max_string)
if show_types and not isinstance(dtype, (dt.Struct, dt.Map, dt.Array)):
Expand Down Expand Up @@ -433,7 +413,18 @@ def _to_rich_table(
if not next_flex_cols:
break

rich_table = rich.table.Table(padding=(0, 1, 0, 1))
if options.show_count:
# use underscore to be friendly to i18n and python REPL
nrows = f"{table.count().execute():_}"
else:
nrows = "…"
if isinstance(tablish, ir.Table):
dims = f"{orig_ncols:_} cols by {nrows} rows"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this follow the long-standing convention of "rows by columns"?

else:
dims = f"{nrows} rows"
rich_table = rich.table.Table(
title=dims, title_justify="left", padding=(0, 1, 0, 1)
)

# Configure the columns on the rich table.
for name, dtype, _, max_width in col_info:
Expand Down
13 changes: 9 additions & 4 deletions ibis/expr/types/relations.py
Original file line number Diff line number Diff line change
Expand Up @@ -507,6 +507,7 @@ def preview(
max_length: int | None = None,
max_string: int | None = None,
max_depth: int | None = None,
show_count: bool | None = None,
console_width: int | float | None = None,
) -> RichTable:
"""Return a subset as a Rich Table.
Expand All @@ -528,6 +529,8 @@ def preview(
Maximum length for pretty-printed strings
max_depth
Maximum depth for nested data types
show_count
Show the row count. This can be computationally expensive and slow.
console_width
Width of the console in characters. If not specified, the width
will be inferred from the console.
Expand Down Expand Up @@ -559,15 +562,17 @@ def preview(
"""
from ibis.expr.types.pretty import to_rich

return to_rich(
self,
max_columns=max_columns,
overrides = dict(
max_rows=max_rows,
max_columns=max_columns,
max_length=max_length,
max_string=max_string,
max_depth=max_depth,
console_width=console_width,
show_count=show_count,
)
overrides = {k: v for k, v in overrides.items() if v is not None}
options = ibis.options.repr.interactive.copy(**overrides)
return to_rich(self, options=options, console_width=console_width)

@overload
def __getitem__(self, what: str | int) -> ir.Column: ...
Expand Down
Loading